Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-29 Thread Chris Wilson
Quoting Michael Sartain (2019-01-29 01:52:12)
> On Mon, Jan 21, 2019, at 4:20 PM, Chris Wilson wrote:
> > Rather than every backend and GPU driver reinventing the same wheel for
> > user level debugging of HW execution, the common dma-fence framework
> > should include the tracing infrastructure required for most client API
> > level flow visualisation.
> > 
> > With these common dma-fence level tracepoints, the userspace tools can
> > establish a detailed view of the client <-> HW flow across different
> > kernels. There is a strong ask to have this available, so that the
> > userspace developer can effectively assess if they're doing a good job
> > about feeding the beast of a GPU hardware.
> ...
> 
> I've got a first pass of this visualizing with gpuvis. Screenshots:
> 
> ; with dma_event tracepoints patch
> https://imgur.com/a/MwvoAYY
> 
> ; with old i915 tracepoints
> https://imgur.com/a/tG2iyHS
> 
> Couple questions...
> 
> With your new dma_event traceponts patch, we're still getting these
> tracepoints:
> 
>   i915_request_in
>   i915_request_out

These are debugging not really tracepoints and should be covered by
trace_printk already. Left in this patch as they are a slightly
different argument to remove (as in they are not directly replaced by
dma-fence tracing).

>   intel_engine_notify

To be removed upstream very shortly.

> And the in/out tracepoints line up with dma_fence_executes
> (same ctx:seqno and time):
> 
>  -0 [006]   150.376273: dma_fence_execute_start: context=31, 
> seqno=35670, hwid=0
>  -0 [006]   150.413215: dma_fence_execute_end: context=31, 
> seqno=35670, hwid=0
> 
>  -0 [006]   150.376272: i915_request_in:  dev=0, engine=0:0, 
> hw_id=4, ctx=31, seqno=35670, prio=0, global=41230, port=1
>  -0 [006]   150.413217: i915_request_out: dev=0, engine=0:0, 
> hw_id=4, ctx=31, seqno=35670, global=41230, completed?=1
> 
> However I'm also seeing several i915_request_in --> intel_engine_notify
> tracepoints that don't have dma_fence_execute_* tracepoints:

Yes. I was trying to wean the API off expecting having an exact match
and just be happy with context in/out events, not request level details.

> RenderThread-1279  [001]   150.341336: dma_fence_init:   driver=i915 
> timeline=ShooterGame[1226]/2 context=31 seqno=35669
> RenderThread-1279  [001]   150.341352: dma_fence_emit:   context=31, 
> seqno=35669
>   -0 [006]   150.376271: i915_request_in:  dev=0, 
> engine=0:0, hw_id=4, ctx=31, seqno=35669, prio=0, global=41229, port=1
>   -0 [006]   150.411525: intel_engine_notify:  dev=0, 
> engine=0:0, seqno=41229, waiters=1
> RenderThread-1279  [001]   150.419779: dma_fence_signaled:   context=31, 
> seqno=35669
> RenderThread-1279  [001]   150.419838: dma_fence_destroy:context=31, 
> seqno=35669
> 
> I assume something is going on at a lower level that we can't get the
> information for via dma_fence?

Deliberate obfuscation. It more or less lets us know what client was
running on the GPU at any one time, but you have to work back to
identify exactly what fence by inspecting the signaling timeline.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-28 Thread Michael Sartain
On Mon, Jan 21, 2019, at 4:20 PM, Chris Wilson wrote:
> Rather than every backend and GPU driver reinventing the same wheel for
> user level debugging of HW execution, the common dma-fence framework
> should include the tracing infrastructure required for most client API
> level flow visualisation.
> 
> With these common dma-fence level tracepoints, the userspace tools can
> establish a detailed view of the client <-> HW flow across different
> kernels. There is a strong ask to have this available, so that the
> userspace developer can effectively assess if they're doing a good job
> about feeding the beast of a GPU hardware.
...

I've got a first pass of this visualizing with gpuvis. Screenshots:

; with dma_event tracepoints patch
https://imgur.com/a/MwvoAYY

; with old i915 tracepoints
https://imgur.com/a/tG2iyHS

Couple questions...

With your new dma_event traceponts patch, we're still getting these
tracepoints:

  i915_request_in
  i915_request_out
  intel_engine_notify

And the in/out tracepoints line up with dma_fence_executes
(same ctx:seqno and time):

 -0 [006]   150.376273: dma_fence_execute_start: context=31, 
seqno=35670, hwid=0
 -0 [006]   150.413215: dma_fence_execute_end: context=31, 
seqno=35670, hwid=0

 -0 [006]   150.376272: i915_request_in:  dev=0, engine=0:0, 
hw_id=4, ctx=31, seqno=35670, prio=0, global=41230, port=1
 -0 [006]   150.413217: i915_request_out: dev=0, engine=0:0, 
hw_id=4, ctx=31, seqno=35670, global=41230, completed?=1

However I'm also seeing several i915_request_in --> intel_engine_notify
tracepoints that don't have dma_fence_execute_* tracepoints:

RenderThread-1279  [001]   150.341336: dma_fence_init:   driver=i915 
timeline=ShooterGame[1226]/2 context=31 seqno=35669
RenderThread-1279  [001]   150.341352: dma_fence_emit:   context=31, 
seqno=35669
  -0 [006]   150.376271: i915_request_in:  dev=0, 
engine=0:0, hw_id=4, ctx=31, seqno=35669, prio=0, global=41229, port=1
  -0 [006]   150.411525: intel_engine_notify:  dev=0, 
engine=0:0, seqno=41229, waiters=1
RenderThread-1279  [001]   150.419779: dma_fence_signaled:   context=31, 
seqno=35669
RenderThread-1279  [001]   150.419838: dma_fence_destroy:context=31, 
seqno=35669

I assume something is going on at a lower level that we can't get the
information for via dma_fence?

Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-24 Thread Eric Anholt
Chris Wilson  writes:

> Quoting Koenig, Christian (2019-01-22 08:49:30)
>> Am 22.01.19 um 00:20 schrieb Chris Wilson:
>> > Rather than every backend and GPU driver reinventing the same wheel for
>> > user level debugging of HW execution, the common dma-fence framework
>> > should include the tracing infrastructure required for most client API
>> > level flow visualisation.
>> >
>> > With these common dma-fence level tracepoints, the userspace tools can
>> > establish a detailed view of the client <-> HW flow across different
>> > kernels. There is a strong ask to have this available, so that the
>> > userspace developer can effectively assess if they're doing a good job
>> > about feeding the beast of a GPU hardware.
>> >
>> > In the case of needing to look into more fine-grained details of how
>> > kernel internals work towards the goal of feeding the beast, the tools
>> > may optionally amend the dma-fence tracing information with the driver
>> > implementation specific. But for such cases, the tools should have a
>> > graceful degradation in case the expected extra tracepoints have
>> > changed or their format differs from the expected, as the kernel
>> > implementation internals are not expected to stay the same.
>> >
>> > It is important to distinguish between tracing for the purpose of client
>> > flow visualisation and tracing for the purpose of low-level kernel
>> > debugging. The latter is highly implementation specific, tied to
>> > a particular HW and driver, whereas the former addresses a common goal
>> > of user level tracing and likely a common set of userspace tools.
>> > Having made the distinction that these tracepoints will be consumed for
>> > client API tooling, we raise the spectre of tracepoint ABI stability. It
>> > is hoped that by defining a common set of dma-fence tracepoints, we avoid
>> > the pitfall of exposing low level details and so restrict ourselves only
>> > to the high level flow that is applicable to all drivers and hardware.
>> > Thus the reserved guarantee that this set of tracepoints will be stable
>> > (with the emphasis on depicting client <-> HW flow as opposed to
>> > driver <-> HW).
>> >
>> > In terms of specific changes to the dma-fence tracing, we remove the
>> > emission of the strings for every tracepoint (reserving them for
>> > dma_fence_init for cases where they have unique dma_fence_ops, and
>> > preferring to have descriptors for the whole fence context). strings do
>> > not pack as well into the ftrace ringbuffer and we would prefer to
>> > reduce the amount of indirect callbacks required for frequent tracepoint
>> > emission.
>> >
>> > Signed-off-by: Chris Wilson 
>> > Cc: Joonas Lahtinen 
>> > Cc: Tvrtko Ursulin 
>> > Cc: Alex Deucher 
>> > Cc: "Christian König" 
>> > Cc: Eric Anholt 
>> > Cc: Pierre-Loup Griffais 
>> > Cc: Michael Sartain 
>> > Cc: Steven Rostedt 
>> 
>> In general yes please! If possible please separate out the changes to 
>> the common dma_fence infrastructure from the i915 changes.
>
> Sure, I was just stressing the impact: remove some randomly placed
> internal debugging tracepoints, try to define useful ones instead :)
>
> On the list of things to do was to convert at least 2 other drivers
> (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> introduction to drm_sched than amdgpu) over to be sure we have the right
> tracepoints.

v3d is using gpu-scheduler, and I'd love to see it using some shared
tracepoints -- I put in some of what we'd need for visualization, but I
haven't actually built visualization yet so I'm not sure it's good
enough.

vc4 isn't using gpu-scheduler yet.  I'm interested in it -- there's the
user qpu pipeline that we should expose, but supporting another pipeline
without the shared scheduler is no fun.


signature.asc
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Daniel Vetter
On Tue, Jan 22, 2019 at 10:58 AM Chris Wilson  wrote:
>
> Quoting Daniel Vetter (2019-01-22 09:11:53)
> > On Tue, Jan 22, 2019 at 10:06 AM Chris Wilson  
> > wrote:
> > >
> > > Quoting Koenig, Christian (2019-01-22 08:49:30)
> > > > Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > > > > Rather than every backend and GPU driver reinventing the same wheel 
> > > > > for
> > > > > user level debugging of HW execution, the common dma-fence framework
> > > > > should include the tracing infrastructure required for most client API
> > > > > level flow visualisation.
> > > > >
> > > > > With these common dma-fence level tracepoints, the userspace tools can
> > > > > establish a detailed view of the client <-> HW flow across different
> > > > > kernels. There is a strong ask to have this available, so that the
> > > > > userspace developer can effectively assess if they're doing a good job
> > > > > about feeding the beast of a GPU hardware.
> > > > >
> > > > > In the case of needing to look into more fine-grained details of how
> > > > > kernel internals work towards the goal of feeding the beast, the tools
> > > > > may optionally amend the dma-fence tracing information with the driver
> > > > > implementation specific. But for such cases, the tools should have a
> > > > > graceful degradation in case the expected extra tracepoints have
> > > > > changed or their format differs from the expected, as the kernel
> > > > > implementation internals are not expected to stay the same.
> > > > >
> > > > > It is important to distinguish between tracing for the purpose of 
> > > > > client
> > > > > flow visualisation and tracing for the purpose of low-level kernel
> > > > > debugging. The latter is highly implementation specific, tied to
> > > > > a particular HW and driver, whereas the former addresses a common goal
> > > > > of user level tracing and likely a common set of userspace tools.
> > > > > Having made the distinction that these tracepoints will be consumed 
> > > > > for
> > > > > client API tooling, we raise the spectre of tracepoint ABI stability. 
> > > > > It
> > > > > is hoped that by defining a common set of dma-fence tracepoints, we 
> > > > > avoid
> > > > > the pitfall of exposing low level details and so restrict ourselves 
> > > > > only
> > > > > to the high level flow that is applicable to all drivers and hardware.
> > > > > Thus the reserved guarantee that this set of tracepoints will be 
> > > > > stable
> > > > > (with the emphasis on depicting client <-> HW flow as opposed to
> > > > > driver <-> HW).
> > > > >
> > > > > In terms of specific changes to the dma-fence tracing, we remove the
> > > > > emission of the strings for every tracepoint (reserving them for
> > > > > dma_fence_init for cases where they have unique dma_fence_ops, and
> > > > > preferring to have descriptors for the whole fence context). strings 
> > > > > do
> > > > > not pack as well into the ftrace ringbuffer and we would prefer to
> > > > > reduce the amount of indirect callbacks required for frequent 
> > > > > tracepoint
> > > > > emission.
> > > > >
> > > > > Signed-off-by: Chris Wilson 
> > > > > Cc: Joonas Lahtinen 
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Alex Deucher 
> > > > > Cc: "Christian König" 
> > > > > Cc: Eric Anholt 
> > > > > Cc: Pierre-Loup Griffais 
> > > > > Cc: Michael Sartain 
> > > > > Cc: Steven Rostedt 
> > > >
> > > > In general yes please! If possible please separate out the changes to
> > > > the common dma_fence infrastructure from the i915 changes.
> > >
> > > Sure, I was just stressing the impact: remove some randomly placed
> > > internal debugging tracepoints, try to define useful ones instead :)
> > >
> > > On the list of things to do was to convert at least 2 other drivers
> > > (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> > > introduction to drm_sched than amdgpu) over to be sure we have the right
> > > tracepoints.
> >
> > I think sprinkling these over the scheduler (maybe just as an opt-in,
> > for the case where the driver doesn't have some additional queueing
> > somewhere) would be good. I haven't checked whether it fits, but would
> > give you a bunch of drivers at once. It might also not cover all the
> > cases (I guess the wait related ones would need to be somewhere else).
>
> And the other thing (that got explicitly asked for!) was that we have
> some igt to make sure we don't surreptitiously break the tracepoints
> in future.
>
> Another task would to devise the set of tracepoints to describe the
> modesetting flow; that more or less is the flow of atomic helpers I
> guess: prepare; wait-on-fences; commit; signal; cleanup. For system
> snooping, knowing a target frame (msc or ust) and how late it was
> delayed and the HW execution flow up to the frame and being able to tie
> that back to the GL/VK client is the grand plan.

Yeah with atomic helpers this should be doable, as long as the driver
uses the commit tracking part of the helpers. That's the 

Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Chris Wilson
Quoting Daniel Vetter (2019-01-22 09:11:53)
> On Tue, Jan 22, 2019 at 10:06 AM Chris Wilson  
> wrote:
> >
> > Quoting Koenig, Christian (2019-01-22 08:49:30)
> > > Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > > > Rather than every backend and GPU driver reinventing the same wheel for
> > > > user level debugging of HW execution, the common dma-fence framework
> > > > should include the tracing infrastructure required for most client API
> > > > level flow visualisation.
> > > >
> > > > With these common dma-fence level tracepoints, the userspace tools can
> > > > establish a detailed view of the client <-> HW flow across different
> > > > kernels. There is a strong ask to have this available, so that the
> > > > userspace developer can effectively assess if they're doing a good job
> > > > about feeding the beast of a GPU hardware.
> > > >
> > > > In the case of needing to look into more fine-grained details of how
> > > > kernel internals work towards the goal of feeding the beast, the tools
> > > > may optionally amend the dma-fence tracing information with the driver
> > > > implementation specific. But for such cases, the tools should have a
> > > > graceful degradation in case the expected extra tracepoints have
> > > > changed or their format differs from the expected, as the kernel
> > > > implementation internals are not expected to stay the same.
> > > >
> > > > It is important to distinguish between tracing for the purpose of client
> > > > flow visualisation and tracing for the purpose of low-level kernel
> > > > debugging. The latter is highly implementation specific, tied to
> > > > a particular HW and driver, whereas the former addresses a common goal
> > > > of user level tracing and likely a common set of userspace tools.
> > > > Having made the distinction that these tracepoints will be consumed for
> > > > client API tooling, we raise the spectre of tracepoint ABI stability. It
> > > > is hoped that by defining a common set of dma-fence tracepoints, we 
> > > > avoid
> > > > the pitfall of exposing low level details and so restrict ourselves only
> > > > to the high level flow that is applicable to all drivers and hardware.
> > > > Thus the reserved guarantee that this set of tracepoints will be stable
> > > > (with the emphasis on depicting client <-> HW flow as opposed to
> > > > driver <-> HW).
> > > >
> > > > In terms of specific changes to the dma-fence tracing, we remove the
> > > > emission of the strings for every tracepoint (reserving them for
> > > > dma_fence_init for cases where they have unique dma_fence_ops, and
> > > > preferring to have descriptors for the whole fence context). strings do
> > > > not pack as well into the ftrace ringbuffer and we would prefer to
> > > > reduce the amount of indirect callbacks required for frequent tracepoint
> > > > emission.
> > > >
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Joonas Lahtinen 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Alex Deucher 
> > > > Cc: "Christian König" 
> > > > Cc: Eric Anholt 
> > > > Cc: Pierre-Loup Griffais 
> > > > Cc: Michael Sartain 
> > > > Cc: Steven Rostedt 
> > >
> > > In general yes please! If possible please separate out the changes to
> > > the common dma_fence infrastructure from the i915 changes.
> >
> > Sure, I was just stressing the impact: remove some randomly placed
> > internal debugging tracepoints, try to define useful ones instead :)
> >
> > On the list of things to do was to convert at least 2 other drivers
> > (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> > introduction to drm_sched than amdgpu) over to be sure we have the right
> > tracepoints.
> 
> I think sprinkling these over the scheduler (maybe just as an opt-in,
> for the case where the driver doesn't have some additional queueing
> somewhere) would be good. I haven't checked whether it fits, but would
> give you a bunch of drivers at once. It might also not cover all the
> cases (I guess the wait related ones would need to be somewhere else).

And the other thing (that got explicitly asked for!) was that we have
some igt to make sure we don't surreptitiously break the tracepoints
in future.

Another task would to devise the set of tracepoints to describe the
modesetting flow; that more or less is the flow of atomic helpers I
guess: prepare; wait-on-fences; commit; signal; cleanup. For system
snooping, knowing a target frame (msc or ust) and how late it was
delayed and the HW execution flow up to the frame and being able to tie
that back to the GL/VK client is the grand plan.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Daniel Vetter
On Tue, Jan 22, 2019 at 10:06 AM Chris Wilson  wrote:
>
> Quoting Koenig, Christian (2019-01-22 08:49:30)
> > Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > > Rather than every backend and GPU driver reinventing the same wheel for
> > > user level debugging of HW execution, the common dma-fence framework
> > > should include the tracing infrastructure required for most client API
> > > level flow visualisation.
> > >
> > > With these common dma-fence level tracepoints, the userspace tools can
> > > establish a detailed view of the client <-> HW flow across different
> > > kernels. There is a strong ask to have this available, so that the
> > > userspace developer can effectively assess if they're doing a good job
> > > about feeding the beast of a GPU hardware.
> > >
> > > In the case of needing to look into more fine-grained details of how
> > > kernel internals work towards the goal of feeding the beast, the tools
> > > may optionally amend the dma-fence tracing information with the driver
> > > implementation specific. But for such cases, the tools should have a
> > > graceful degradation in case the expected extra tracepoints have
> > > changed or their format differs from the expected, as the kernel
> > > implementation internals are not expected to stay the same.
> > >
> > > It is important to distinguish between tracing for the purpose of client
> > > flow visualisation and tracing for the purpose of low-level kernel
> > > debugging. The latter is highly implementation specific, tied to
> > > a particular HW and driver, whereas the former addresses a common goal
> > > of user level tracing and likely a common set of userspace tools.
> > > Having made the distinction that these tracepoints will be consumed for
> > > client API tooling, we raise the spectre of tracepoint ABI stability. It
> > > is hoped that by defining a common set of dma-fence tracepoints, we avoid
> > > the pitfall of exposing low level details and so restrict ourselves only
> > > to the high level flow that is applicable to all drivers and hardware.
> > > Thus the reserved guarantee that this set of tracepoints will be stable
> > > (with the emphasis on depicting client <-> HW flow as opposed to
> > > driver <-> HW).
> > >
> > > In terms of specific changes to the dma-fence tracing, we remove the
> > > emission of the strings for every tracepoint (reserving them for
> > > dma_fence_init for cases where they have unique dma_fence_ops, and
> > > preferring to have descriptors for the whole fence context). strings do
> > > not pack as well into the ftrace ringbuffer and we would prefer to
> > > reduce the amount of indirect callbacks required for frequent tracepoint
> > > emission.
> > >
> > > Signed-off-by: Chris Wilson 
> > > Cc: Joonas Lahtinen 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Alex Deucher 
> > > Cc: "Christian König" 
> > > Cc: Eric Anholt 
> > > Cc: Pierre-Loup Griffais 
> > > Cc: Michael Sartain 
> > > Cc: Steven Rostedt 
> >
> > In general yes please! If possible please separate out the changes to
> > the common dma_fence infrastructure from the i915 changes.
>
> Sure, I was just stressing the impact: remove some randomly placed
> internal debugging tracepoints, try to define useful ones instead :)
>
> On the list of things to do was to convert at least 2 other drivers
> (I was thinking nouveau/msm for simplicity, vc4 for a simpler
> introduction to drm_sched than amdgpu) over to be sure we have the right
> tracepoints.

I think sprinkling these over the scheduler (maybe just as an opt-in,
for the case where the driver doesn't have some additional queueing
somewhere) would be good. I haven't checked whether it fits, but would
give you a bunch of drivers at once. It might also not cover all the
cases (I guess the wait related ones would need to be somewhere else).
-Daniel

> > One thing I'm wondering is why the enable_signaling trace point doesn't
> > need to be exported any more. Is that only used internally in the common
> > infrastructure?
>
> Right. Only used inside the core, and I don't see much call for making
> it easy for drivers to fiddle around bypassing the core
> enable_signaling/signal. (I'm not sure it's useful for client flow
> either, it feels more like dma-fence debugging, but they can just
> not listen to that tracepoint.)
> -Chris
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Chris Wilson
Quoting Koenig, Christian (2019-01-22 08:49:30)
> Am 22.01.19 um 00:20 schrieb Chris Wilson:
> > Rather than every backend and GPU driver reinventing the same wheel for
> > user level debugging of HW execution, the common dma-fence framework
> > should include the tracing infrastructure required for most client API
> > level flow visualisation.
> >
> > With these common dma-fence level tracepoints, the userspace tools can
> > establish a detailed view of the client <-> HW flow across different
> > kernels. There is a strong ask to have this available, so that the
> > userspace developer can effectively assess if they're doing a good job
> > about feeding the beast of a GPU hardware.
> >
> > In the case of needing to look into more fine-grained details of how
> > kernel internals work towards the goal of feeding the beast, the tools
> > may optionally amend the dma-fence tracing information with the driver
> > implementation specific. But for such cases, the tools should have a
> > graceful degradation in case the expected extra tracepoints have
> > changed or their format differs from the expected, as the kernel
> > implementation internals are not expected to stay the same.
> >
> > It is important to distinguish between tracing for the purpose of client
> > flow visualisation and tracing for the purpose of low-level kernel
> > debugging. The latter is highly implementation specific, tied to
> > a particular HW and driver, whereas the former addresses a common goal
> > of user level tracing and likely a common set of userspace tools.
> > Having made the distinction that these tracepoints will be consumed for
> > client API tooling, we raise the spectre of tracepoint ABI stability. It
> > is hoped that by defining a common set of dma-fence tracepoints, we avoid
> > the pitfall of exposing low level details and so restrict ourselves only
> > to the high level flow that is applicable to all drivers and hardware.
> > Thus the reserved guarantee that this set of tracepoints will be stable
> > (with the emphasis on depicting client <-> HW flow as opposed to
> > driver <-> HW).
> >
> > In terms of specific changes to the dma-fence tracing, we remove the
> > emission of the strings for every tracepoint (reserving them for
> > dma_fence_init for cases where they have unique dma_fence_ops, and
> > preferring to have descriptors for the whole fence context). strings do
> > not pack as well into the ftrace ringbuffer and we would prefer to
> > reduce the amount of indirect callbacks required for frequent tracepoint
> > emission.
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: Eric Anholt 
> > Cc: Pierre-Loup Griffais 
> > Cc: Michael Sartain 
> > Cc: Steven Rostedt 
> 
> In general yes please! If possible please separate out the changes to 
> the common dma_fence infrastructure from the i915 changes.

Sure, I was just stressing the impact: remove some randomly placed
internal debugging tracepoints, try to define useful ones instead :)

On the list of things to do was to convert at least 2 other drivers
(I was thinking nouveau/msm for simplicity, vc4 for a simpler
introduction to drm_sched than amdgpu) over to be sure we have the right
tracepoints.
 
> One thing I'm wondering is why the enable_signaling trace point doesn't 
> need to be exported any more. Is that only used internally in the common 
> infrastructure?

Right. Only used inside the core, and I don't see much call for making
it easy for drivers to fiddle around bypassing the core
enable_signaling/signal. (I'm not sure it's useful for client flow
either, it feels more like dma-fence debugging, but they can just
not listen to that tracepoint.)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Koenig, Christian
Am 22.01.19 um 00:20 schrieb Chris Wilson:
> Rather than every backend and GPU driver reinventing the same wheel for
> user level debugging of HW execution, the common dma-fence framework
> should include the tracing infrastructure required for most client API
> level flow visualisation.
>
> With these common dma-fence level tracepoints, the userspace tools can
> establish a detailed view of the client <-> HW flow across different
> kernels. There is a strong ask to have this available, so that the
> userspace developer can effectively assess if they're doing a good job
> about feeding the beast of a GPU hardware.
>
> In the case of needing to look into more fine-grained details of how
> kernel internals work towards the goal of feeding the beast, the tools
> may optionally amend the dma-fence tracing information with the driver
> implementation specific. But for such cases, the tools should have a
> graceful degradation in case the expected extra tracepoints have
> changed or their format differs from the expected, as the kernel
> implementation internals are not expected to stay the same.
>
> It is important to distinguish between tracing for the purpose of client
> flow visualisation and tracing for the purpose of low-level kernel
> debugging. The latter is highly implementation specific, tied to
> a particular HW and driver, whereas the former addresses a common goal
> of user level tracing and likely a common set of userspace tools.
> Having made the distinction that these tracepoints will be consumed for
> client API tooling, we raise the spectre of tracepoint ABI stability. It
> is hoped that by defining a common set of dma-fence tracepoints, we avoid
> the pitfall of exposing low level details and so restrict ourselves only
> to the high level flow that is applicable to all drivers and hardware.
> Thus the reserved guarantee that this set of tracepoints will be stable
> (with the emphasis on depicting client <-> HW flow as opposed to
> driver <-> HW).
>
> In terms of specific changes to the dma-fence tracing, we remove the
> emission of the strings for every tracepoint (reserving them for
> dma_fence_init for cases where they have unique dma_fence_ops, and
> preferring to have descriptors for the whole fence context). strings do
> not pack as well into the ftrace ringbuffer and we would prefer to
> reduce the amount of indirect callbacks required for frequent tracepoint
> emission.
>
> Signed-off-by: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: Eric Anholt 
> Cc: Pierre-Loup Griffais 
> Cc: Michael Sartain 
> Cc: Steven Rostedt 

In general yes please! If possible please separate out the changes to 
the common dma_fence infrastructure from the i915 changes.

One thing I'm wondering is why the enable_signaling trace point doesn't 
need to be exported any more. Is that only used internally in the common 
infrastructure?

Apart from that I'm on sick leave today, so give me at least a few days 
to recover and take a closer look.

Thanks,
Christian.

> ---
>   drivers/dma-buf/dma-fence.c |   9 +-
>   drivers/gpu/drm/i915/i915_gem_clflush.c |   5 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c  |   1 -
>   drivers/gpu/drm/i915/i915_request.c |  16 +-
>   drivers/gpu/drm/i915/i915_timeline.c|   5 +
>   drivers/gpu/drm/i915/i915_trace.h   | 134 ---
>   drivers/gpu/drm/i915/intel_guc_submission.c |  10 ++
>   drivers/gpu/drm/i915/intel_lrc.c|   6 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>   include/trace/events/dma_fence.h| 177 +++-
>   10 files changed, 214 insertions(+), 151 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3aa8733f832a..5c93ed34b1ff 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -27,8 +27,15 @@
>   #define CREATE_TRACE_POINTS
>   #include 
>   
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_create);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_destroy);
> +
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_await);
>   EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> -EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_start);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_end);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_start);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_end);
>   
>   static DEFINE_SPINLOCK(dma_fence_stub_lock);
>   static struct dma_fence dma_fence_stub;
> diff --git a/drivers/gpu/drm/i915/i915_gem_clflush.c 
> b/drivers/gpu/drm/i915/i915_gem_clflush.c
> index 8e74c23cbd91..435c1303ecc8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_clflush.c
> +++ b/drivers/gpu/drm/i915/i915_gem_clflush.c
> @@ -22,6 +22,8 @@
>*
>*/
>   
> +#include 
> +
>   #include "i915_drv.h"
>   #include "intel_frontbuffer.h"
>   #include "i915_gem_clflush.h"
> @@ -73,6 +75,7 @@ static void 

[Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-21 Thread Chris Wilson
Rather than every backend and GPU driver reinventing the same wheel for
user level debugging of HW execution, the common dma-fence framework
should include the tracing infrastructure required for most client API
level flow visualisation.

With these common dma-fence level tracepoints, the userspace tools can
establish a detailed view of the client <-> HW flow across different
kernels. There is a strong ask to have this available, so that the
userspace developer can effectively assess if they're doing a good job
about feeding the beast of a GPU hardware.

In the case of needing to look into more fine-grained details of how
kernel internals work towards the goal of feeding the beast, the tools
may optionally amend the dma-fence tracing information with the driver
implementation specific. But for such cases, the tools should have a
graceful degradation in case the expected extra tracepoints have
changed or their format differs from the expected, as the kernel
implementation internals are not expected to stay the same.

It is important to distinguish between tracing for the purpose of client
flow visualisation and tracing for the purpose of low-level kernel
debugging. The latter is highly implementation specific, tied to
a particular HW and driver, whereas the former addresses a common goal
of user level tracing and likely a common set of userspace tools.
Having made the distinction that these tracepoints will be consumed for
client API tooling, we raise the spectre of tracepoint ABI stability. It
is hoped that by defining a common set of dma-fence tracepoints, we avoid
the pitfall of exposing low level details and so restrict ourselves only
to the high level flow that is applicable to all drivers and hardware.
Thus the reserved guarantee that this set of tracepoints will be stable
(with the emphasis on depicting client <-> HW flow as opposed to
driver <-> HW).

In terms of specific changes to the dma-fence tracing, we remove the
emission of the strings for every tracepoint (reserving them for
dma_fence_init for cases where they have unique dma_fence_ops, and
preferring to have descriptors for the whole fence context). strings do
not pack as well into the ftrace ringbuffer and we would prefer to
reduce the amount of indirect callbacks required for frequent tracepoint
emission.

Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: Eric Anholt 
Cc: Pierre-Loup Griffais 
Cc: Michael Sartain 
Cc: Steven Rostedt 
---
 drivers/dma-buf/dma-fence.c |   9 +-
 drivers/gpu/drm/i915/i915_gem_clflush.c |   5 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c  |   1 -
 drivers/gpu/drm/i915/i915_request.c |  16 +-
 drivers/gpu/drm/i915/i915_timeline.c|   5 +
 drivers/gpu/drm/i915/i915_trace.h   | 134 ---
 drivers/gpu/drm/i915/intel_guc_submission.c |  10 ++
 drivers/gpu/drm/i915/intel_lrc.c|   6 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 include/trace/events/dma_fence.h| 177 +++-
 10 files changed, 214 insertions(+), 151 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3aa8733f832a..5c93ed34b1ff 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -27,8 +27,15 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_create);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_destroy);
+
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_await);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
-EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_start);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_end);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_start);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_end);
 
 static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
diff --git a/drivers/gpu/drm/i915/i915_gem_clflush.c 
b/drivers/gpu/drm/i915/i915_gem_clflush.c
index 8e74c23cbd91..435c1303ecc8 100644
--- a/drivers/gpu/drm/i915/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/i915_gem_clflush.c
@@ -22,6 +22,8 @@
  *
  */
 
+#include 
+
 #include "i915_drv.h"
 #include "intel_frontbuffer.h"
 #include "i915_gem_clflush.h"
@@ -73,6 +75,7 @@ static void i915_clflush_work(struct work_struct *work)
struct clflush *clflush = container_of(work, typeof(*clflush), work);
struct drm_i915_gem_object *obj = clflush->obj;
 
+   trace_dma_fence_execute_start(>dma, smp_processor_id());
if (i915_gem_object_pin_pages(obj)) {
DRM_ERROR("Failed to acquire obj->pages for clflushing\n");
goto out;
@@ -83,6 +86,7 @@ static void i915_clflush_work(struct work_struct *work)
i915_gem_object_unpin_pages(obj);
 
 out:
+   trace_dma_fence_execute_end(>dma, smp_processor_id());
i915_gem_object_put(obj);
 
dma_fence_signal(>dma);
@@ -97,6 +101,7 @@ i915_clflush_notify(struct