Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-28 Thread Rogovin, Kevin
Hi all,

Just a gentle poke.

Even though a serious issue was already found by Chris Wilson on batchbuffer 
migration, I would like for folks to look at the series (in particular the 
monster patch 16) and give comments. With those comments I will then create a 
v2 (indeed I've already implemented fixes for the issue that Chris pointed out 
on batchbuffer migration and a pair of issues I realized on the script at patch 
17).

Best Regards,
 -Kevin



-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, September 27, 2017 2:38 PM
To: Chris Wilson <ch...@chris-wilson.co.uk>; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Hi,

 If we just want to send to the kernel the data from the trace, I can do that 
very easily; just make such a GEM BO, comprising of dword-pairs of 
(TraceCallID, BatchbufferOffset). That will be a small buffer and together with 
the apitrace file, will give complete data. 

 I could probably make such a dedicated tool quite quickly, or add that 
functionality to the logger.

-Kevin

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Wednesday, September 27, 2017 1:21 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-27 07:53:29)
> Hi,
> 
>  Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started. 
> 
>  I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
> 
>1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.

Yes. With the current intel_batchbuffer.c this should be relatively easy (I 
suggest you limit yourself to recent kernels for that simplification); see 
EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE.
 
>   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
> BO; In order to read the GPU state more accurately I need to walk the log and 
> update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).

Up to, but my paranoia goes the other way. Once the ioctl returns the hw is 
indeed using that memory, so I have less trust of it. If you need to tie the 
relocated pointers to the trace, I would also emit relocations into the trace. 
For the reasons of port-mortem GPU hang debugging, I would want the execbuf be 
complete before the ioctl, rather than post processing.
 
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.

Yup. As times go on, I hope this becomes a more complete flight-recorder that 
we don't have to rely on referencing back to a separate trace to work out the 
interesting calls. My goal is that you can give one instruction (that doesn't 
require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or 
better a script installed in mesa-utils?) to a bug reporter and that will then 
capture enough information.
 
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
> 
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always 

Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

 If we just want to send to the kernel the data from the trace, I can do that 
very easily; just make such a GEM BO, comprising of dword-pairs of 
(TraceCallID, BatchbufferOffset). That will be a small buffer and together with 
the apitrace file, will give complete data. 

 I could probably make such a dedicated tool quite quickly, or add that 
functionality to the logger.

-Kevin

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Wednesday, September 27, 2017 1:21 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-27 07:53:29)
> Hi,
> 
>  Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started. 
> 
>  I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
> 
>1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.

Yes. With the current intel_batchbuffer.c this should be relatively easy (I 
suggest you limit yourself to recent kernels for that simplification); see 
EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE.
 
>   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
> BO; In order to read the GPU state more accurately I need to walk the log and 
> update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).

Up to, but my paranoia goes the other way. Once the ioctl returns the hw is 
indeed using that memory, so I have less trust of it. If you need to tie the 
relocated pointers to the trace, I would also emit relocations into the trace. 
For the reasons of port-mortem GPU hang debugging, I would want the execbuf be 
complete before the ioctl, rather than post processing.
 
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.

Yup. As times go on, I hope this becomes a more complete flight-recorder that 
we don't have to rely on referencing back to a separate trace to work out the 
interesting calls. My goal is that you can give one instruction (that doesn't 
require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or 
better a script installed in mesa-utils?) to a bug reporter and that will then 
capture enough information.
 
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
> 
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang would leave behind a file). 

My vision is that you would attach all "files" to the execbuf, but then again 
I'm focusing on fdr and not debugging of new features. So long as we are 
talking about a few megabytes of trace data that isn't too bad. Then we don't 
have to fiddle around with extra files to find the ones corresponding to the 
hang, as they will be recorded in the error state. The contents I leave up to 
you :) (I figure it is a snowball, once a tracing mechanism exists for 
capturing GPU hangs, there'll be lots of suggestions! One is probably just to 
capture the aub annotations alongside the batch. Hmm, that might be a good one 
for me to try 

Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Chris Wilson
Quoting Rogovin, Kevin (2017-09-27 07:53:29)
> Hi,
> 
>  Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started. 
> 
>  I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
> 
>1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.

Yes. With the current intel_batchbuffer.c this should be relatively easy
(I suggest you limit yourself to recent kernels for that
simplification);
see EXEC_BATCH_FIRST and remember to mark the trace bo as
EXEC_OBJECT_CAPTURE.
 
>   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
> BO; In order to read the GPU state more accurately I need to walk the log and 
> update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).

Up to, but my paranoia goes the other way. Once the ioctl returns the hw
is indeed using that memory, so I have less trust of it. If you need to
tie the relocated pointers to the trace, I would also emit relocations
into the trace. For the reasons of port-mortem GPU hang debugging, I
would want the execbuf be complete before the ioctl, rather than post
processing.
 
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.

Yup. As times go on, I hope this becomes a more complete flight-recorder
that we don't have to rely on referencing back to a separate trace to
work out the interesting calls. My goal is that you can give one
instruction (that doesn't require any additional dependencies, so can
just be LD_PRELOAD=i965-fdr.so, or better a script installed in mesa-utils?)
to a bug reporter and that will then capture enough information.
 
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
> 
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang would leave behind a file). 

My vision is that you would attach all "files" to the execbuf, but then
again I'm focusing on fdr and not debugging of new features. So long
as we are talking about a few megabytes of trace data that isn't too
bad. Then we don't have to fiddle around with extra files to find the
ones corresponding to the hang, as they will be recorded in the error
state. The contents I leave up to you :) (I figure it is a snowball,
once a tracing mechanism exists for capturing GPU hangs, there'll be
lots of suggestions! One is probably just to capture the aub annotations
alongside the batch. Hmm, that might be a good one for me to try just so
I can flesh out the fdr mechanism...)
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
HI,

 In spirit, stuffing data into MI_NOOP is nicer since then one can just rely on 
aubinator to read that data and go to town. The main issues I see are the 
following.

 1. One needs to now insert MI_NOOP's into the command buffer in order to 
insert strings. This changes what is sent to the GPU (though one can argue that 
MI_NOOP should not really matter). The big nasty potential change is in 
situations where the command buffer approaches full, with the MI_NOOP it fills 
faster and thus that dramatically changes what a driver sends to the GPU since 
a new batchbuffer triggers more state and -FLUSHES-.

2. It means more modifications to the driver in order to insert the messages.

3. The driver needs to somehow get a call-id from the application in order to 
know what value to place in the MI_NOOP. 

The worst issue (for me) is #1; #3 is solveable-ish by making some function 
pointer available to set the value to stuff in the MI_NOOP unused bits. Issue 
#2 is quite icky because I have more in mind for the logger than Mesa/i965 and 
I want to keep the work to add it to a driver to a bare minimum.

FWIW, when I started this, I wanted to do it via aub-dumper and aubinator where 
they would produce auxiliary files that had the necessary data to know what in 
it came from where. But the more I looked at the issues I wanted to solve, the 
more trickier it seemed to me to use aubdumper and aubinator to accomplish that.

-Original Message-
From: Landwerlin, Lionel G 
Sent: Wednesday, September 27, 2017 12:35 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; Chris Wilson 
<ch...@chris-wilson.co.uk>; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

A few months ago I implemented debug messages in the command stream by stuffing 
the unused bits of MI_NOOP :

https://patchwork.freedesktop.org/series/26079/

Aubinator would then read the bits and print the messages.

We might be able to reuse similar idea to get away with any external interface.
Instead of having strings of characters we could put a marker with the BO 
handle that could be used to store all of the metadata about a particular draw 
call.

What do you think?

-
Lionel

On 27/09/17 07:53, Rogovin, Kevin wrote:
> Hi,
>
>   Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started.
>
>   I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
>
> 1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.
>
>2. -AFTER- the ioctl returns, emit the log data (as now) and delete the 
> GEM BO; In order to read the GPU state more accurately I need to walk the log 
> and update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).
>
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.
>
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
>
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang would leave behind a file).
>
> Let me know, what is best, and I will do it.
>
> -Kevin
>
>
> -Or

Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

Sighs, I forget one -critical- issue on decoding the batchbuffer: it needs to 
reloc data from the kernel to have any chance of correctly decoding things 
referred to by the batch buffer (which is oodles of stuff), thus decode can 
only happen after kernel succeeds.

However, I can make it emit a separate file before the call to the kernel 
giving atleast the apitrace data (and perhaps the filename to give an 
indication too) and delete the file after the ioctl returns; from there it is 
straight forward to have n files alive to see the last n execbuffer2 ioctls.

Though, if the logger is to send the api-trace call id's to the kernel on 
execbuffer2, then this does not matter.

-Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, September 27, 2017 9:53 AM
To: 'Chris Wilson' <ch...@chris-wilson.co.uk>; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Hi,

 Right now the way the thing works is that it walks the batchbuffer just after 
the kernel returns from the ioctl and updates its internal view of the GPU 
state as it walks and emits to the log file the data. The log on a single 
batchbuffer is (essentially) just a list of call ID's from the apitrace 
together of "where in the batchbuffer" that call started. 

 I confess that I had not realized the potential application for using 
something like this to help diagnose GPU hangs! I think it is a really good 
idea. What I could do is the following (and it is not terribly hard to do):

   1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the 
log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call 
ID, and maybe GL function data) and modify the ioctl to have an extra buffer.

  2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
BO; In order to read the GPU state more accurately I need to walk the log and 
update the GPU state after the ioctl (mostly out of paranoia for values copied 
from BO's to pipeline registers).

What would happen, is that if a batchbuffer made the GPU hang, you would then 
know all the GL commands (trace ID's from the API trace) that made stuff on 
that batchbuffer. Then one could go back to the apitrace of the troublesome 
application  and have a much better starting place to debug.

We could also do something evil looking and put another modification on 
apitrace where it can have a list of call trace ranges where it inserts 
glFinish after each call. Those glFinish()'s will then force the ioctl of the 
exact troublesome draw call without needing to tell i965 to flush after each 
draw call.

Just to make sure, you want the "apitrace" data (call ID list, maybe function 
name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
code know which one to use to dump? I would guess if the batchbuffer is the 
first buffer, then it would be the last buffer, otherwise if the batch buffer 
is the last one, I guess it would be one just before, but that might screw up 
reloc-data if any of the relocs in the batchbuffer refer to itself. I can also 
emit the data to a file and close the file before the ioctl and if the ioctl 
returns, delete said file (assuming a GPU hang always stops the process, then a 
hang would leave behind a file). 

Let me know, what is best, and I will do it.

-Kevin


-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
Sent: Tuesday, September 26, 2017 11:20 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-26 10:35:44)
> Hi,
> 
>   Attached to this message are the following:
>  1. a file giving example usage of the tool with a modified 
> apitrace to produce json output
> 
>  2. the patches to apitrace to make it BatchbufferLogger aware
> 
>  3. the JSON files (gzipped) made from the example.
> 
> 
> I encourage (and hope) people will take a look at the JSON to see the 
> potential of the tool.

The automatic apitrace-esque logging seems very useful. How easy would it be to 
write that trace into a bo and associate with the execbuffer (from my pov, it 
should be that hard)? That way you could get the most recent actions before a 
GPU hang, attach them to a bug and decode them at leisure. (An extension may be 
to keep a ring of the last N traces so that you can see some setup a few 
batches ago that triggered a hang in this one.)

I presume you already have such a plan, and I'm just preaching to the choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Lionel Landwerlin
A few months ago I implemented debug messages in the command stream by 
stuffing the unused bits of MI_NOOP :


https://patchwork.freedesktop.org/series/26079/

Aubinator would then read the bits and print the messages.

We might be able to reuse similar idea to get away with any external 
interface.
Instead of having strings of characters we could put a marker with the 
BO handle that could be used to store all of the metadata about a 
particular draw call.


What do you think?

-
Lionel

On 27/09/17 07:53, Rogovin, Kevin wrote:

Hi,

  Right now the way the thing works is that it walks the batchbuffer just after the 
kernel returns from the ioctl and updates its internal view of the GPU state as it walks 
and emits to the log file the data. The log on a single batchbuffer is (essentially) just 
a list of call ID's from the apitrace together of "where in the batchbuffer" 
that call started.

  I confess that I had not realized the potential application for using 
something like this to help diagnose GPU hangs! I think it is a really good 
idea. What I could do is the following (and it is not terribly hard to do):

1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the 
log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call 
ID, and maybe GL function data) and modify the ioctl to have an extra buffer.

   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
BO; In order to read the GPU state more accurately I need to walk the log and 
update the GPU state after the ioctl (mostly out of paranoia for values copied 
from BO's to pipeline registers).

What would happen, is that if a batchbuffer made the GPU hang, you would then 
know all the GL commands (trace ID's from the API trace) that made stuff on 
that batchbuffer. Then one could go back to the apitrace of the troublesome 
application  and have a much better starting place to debug.

We could also do something evil looking and put another modification on 
apitrace where it can have a list of call trace ranges where it inserts 
glFinish after each call. Those glFinish()'s will then force the ioctl of the 
exact troublesome draw call without needing to tell i965 to flush after each 
draw call.

Just to make sure, you want the "apitrace" data (call ID list, maybe function 
name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug code know 
which one to use to dump? I would guess if the batchbuffer is the first buffer, then it 
would be the last buffer, otherwise if the batch buffer is the last one, I guess it would 
be one just before, but that might screw up reloc-data if any of the relocs in the 
batchbuffer refer to itself. I can also emit the data to a file and close the file before 
the ioctl and if the ioctl returns, delete said file (assuming a GPU hang always stops 
the process, then a hang would leave behind a file).

Let me know, what is best, and I will do it.

-Kevin


-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
Sent: Tuesday, September 26, 2017 11:20 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-26 10:35:44)

Hi,

   Attached to this message are the following:
  1. a file giving example usage of the tool with a modified
apitrace to produce json output

  2. the patches to apitrace to make it BatchbufferLogger aware

  3. the JSON files (gzipped) made from the example.


I encourage (and hope) people will take a look at the JSON to see the potential 
of the tool.

The automatic apitrace-esque logging seems very useful. How easy would it be to 
write that trace into a bo and associate with the execbuffer (from my pov, it 
should be that hard)? That way you could get the most recent actions before a 
GPU hang, attach them to a bug and decode them at leisure. (An extension may be 
to keep a ring of the last N traces so that you can see some setup a few 
batches ago that triggered a hang in this one.)

I presume you already have such a plan, and I'm just preaching to the choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

 Right now the way the thing works is that it walks the batchbuffer just after 
the kernel returns from the ioctl and updates its internal view of the GPU 
state as it walks and emits to the log file the data. The log on a single 
batchbuffer is (essentially) just a list of call ID's from the apitrace 
together of "where in the batchbuffer" that call started. 

 I confess that I had not realized the potential application for using 
something like this to help diagnose GPU hangs! I think it is a really good 
idea. What I could do is the following (and it is not terribly hard to do):

   1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the 
log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call 
ID, and maybe GL function data) and modify the ioctl to have an extra buffer.

  2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
BO; In order to read the GPU state more accurately I need to walk the log and 
update the GPU state after the ioctl (mostly out of paranoia for values copied 
from BO's to pipeline registers).

What would happen, is that if a batchbuffer made the GPU hang, you would then 
know all the GL commands (trace ID's from the API trace) that made stuff on 
that batchbuffer. Then one could go back to the apitrace of the troublesome 
application  and have a much better starting place to debug.

We could also do something evil looking and put another modification on 
apitrace where it can have a list of call trace ranges where it inserts 
glFinish after each call. Those glFinish()'s will then force the ioctl of the 
exact troublesome draw call without needing to tell i965 to flush after each 
draw call.

Just to make sure, you want the "apitrace" data (call ID list, maybe function 
name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
code know which one to use to dump? I would guess if the batchbuffer is the 
first buffer, then it would be the last buffer, otherwise if the batch buffer 
is the last one, I guess it would be one just before, but that might screw up 
reloc-data if any of the relocs in the batchbuffer refer to itself. I can also 
emit the data to a file and close the file before the ioctl and if the ioctl 
returns, delete said file (assuming a GPU hang always stops the process, then a 
hang would leave behind a file). 

Let me know, what is best, and I will do it.

-Kevin


-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Tuesday, September 26, 2017 11:20 PM
To: Rogovin, Kevin <kevin.rogo...@intel.com>; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-26 10:35:44)
> Hi,
> 
>   Attached to this message are the following:
>  1. a file giving example usage of the tool with a modified 
> apitrace to produce json output
> 
>  2. the patches to apitrace to make it BatchbufferLogger aware
> 
>  3. the JSON files (gzipped) made from the example.
> 
> 
> I encourage (and hope) people will take a look at the JSON to see the 
> potential of the tool.

The automatic apitrace-esque logging seems very useful. How easy would it be to 
write that trace into a bo and associate with the execbuffer (from my pov, it 
should be that hard)? That way you could get the most recent actions before a 
GPU hang, attach them to a bug and decode them at leisure. (An extension may be 
to keep a ring of the last N traces so that you can see some setup a few 
batches ago that triggered a hang in this one.)

I presume you already have such a plan, and I'm just preaching to the choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-26 Thread Chris Wilson
Quoting Rogovin, Kevin (2017-09-26 10:35:44)
> Hi,
> 
>   Attached to this message are the following:
>  1. a file giving example usage of the tool with a modified apitrace to 
> produce json output
> 
>  2. the patches to apitrace to make it BatchbufferLogger aware
> 
>  3. the JSON files (gzipped) made from the example.
> 
> 
> I encourage (and hope) people will take a look at the JSON to see the 
> potential of the tool.

The automatic apitrace-esque logging seems very useful. How easy would
it be to write that trace into a bo and associate with the execbuffer
(from my pov, it should be that hard)? That way you could get the most
recent actions before a GPU hang, attach them to a bug and decode them
at leisure. (An extension may be to keep a ring of the last N traces so
that you can see some setup a few batches ago that triggered a hang in
this one.)

I presume you already have such a plan, and I'm just preaching to the
choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-25 Thread kevin . rogovin
From: Kevin Rogovin 

This patch series defines and implements a BatchbufferLogger
for Intel GEN. The main purpose of the BatchbufferLogger is
to strongly correlate API calls to data added to a batchbuffer.
In addition to this function, the BatchbufferLogger also tracks
GPU state (respecting HW context as well). The logger intercepts
drmIoctl recording the information needed to decode a bachbuffer
(such as GEM BO creation/deletion, HW context create/delete,
and most importantly execbuffer2). When the execbuffer2 returns
from the kernel, the BatchbufferLogger will log information
in its log of what was added when and in addition log the
GPU state (at the point in the batchbuffer) of 3DPRIMITIVE and
GPGPU_WALKER commands.

It is an application's requirment to tell the BatchbufferLogger
just before and just after an API call. Because of the need
to intercept drmIoctl, having an application link against
BatchbufferLogger is not robust. Instead, an application is
to use dlsym to fetch the correction a function pointer that
returns the BatchbufferLogger's application interface. The
interface of the BatchbufferLogger is defined in patch 0002.
A script is also provided to use the BatchbufferLogger in
an easier way than needing to set environmental variables.

On the subject of application integration, I have a -very-
small patch series that enabled BatchbufferLogger for
apitrace. I can share these patches if anyone asks, but I
cannot submit them to apitrace until atleast the BatchbufferLogger
is in Mesa with a stable application interface.

The log emitted by the BatchbufferLogger is a sequence of blocks
with possibility of blocks being elements of blocks. The top level
blocks are the API call markers created from the calls into the
BatchbufferLogger from the application. The format of the log
is defined in a dedicated header (from patch 0003). Tools are
included to convert the log to JSON, XML and text. The simple
file format should allow others to be able to take the data and
use it however they see fit. The JSON output alone can be quite
illuminating to use when debugging/enhancing the i965 driver
for a single frame of a troublesome application.

The patch series is organized into the following blocks:

0001-0003: Define the BatchbufferLogger interfaces
0004-0005: Minor fixes to i965 driver
0006-0006: Hooking of BatchbufferLogger into i965
0007-0015: Fixes and enhancements to intel/compiler,
   intel/tools and intel/common.
0016-0018: Implementation of BatchBufferLogger
0019-0021: Tools to decode log to JSON, XML and text
0022-0022: Command line tool for disassembling shader
   binaries.

Kevin Rogovin (22):
  intel/tools: define BatchBufferLogger driver interface
  intel/tools: define BatchbufferLogger application interface
  intel/tools: BatchBufferLogger define output file format of tool
  i965: assign BindingTableEntryCount of INTERFACE_DESCRIPTOR_DATA
  i965: correctly assign SamplerCount of INTERFACE_DESCRIPTOR_DATA
  i965: Enable BatchbufferLogger in i965 driver
  intel/common/gen_decoder: make useable from C++ source
  intel/compiler: brw_validate_instructions to take const void* instead
of void*
  intel/compiler: fix for memmove argument on annotating error
  intel/compiler:add function to give option to print offsets into
assembly
  intel/tools/disasm: correctly observe FILE *out parameter
  intel/tools/disasm: make useable from C++ sources
  intel/tools/disasm: gen_disasm_disassemble to take const void* instead
of void*
  intel/tools/disasm: add gen_disasm_assembly_length function
  intel/tools/disasm: make sure that entire range is disassembled
  intel/tools/BatchbufferLogger: first implementation
  intel/tools/BatchbufferLogger: install i965_batchbuffer non-driver
interface headers
  inte/tools/BatchbufferLogger : add shell script for batchbuffer logger
  intel/tools/BatchbufferLogger (txt-output): example txt dumper
  intel/tools/BatchbufferLogger (output-xml): add outputter to XML
  intel/tools/BatchbufferLogger (output-json): add json outputter
  intel/tools: add command line GEN shader disassembler tool

 src/intel/Makefile.tools.am|   71 +
 src/intel/common/gen_decoder.h |7 +
 src/intel/common/gen_device_info.h |8 +
 src/intel/compiler/brw_eu.c|   11 +-
 src/intel/compiler/brw_eu.h|5 +-
 src/intel/compiler/brw_eu_validate.c   |2 +-
 src/intel/compiler/intel_asm_annotation.c  |5 +-
 src/intel/tools/.gitignore |5 +
 src/intel/tools/disasm.c   |   28 +-
 src/intel/tools/gen_disasm.h   |   12 +-
 src/intel/tools/gen_shader_disassembler.c  |  218 +
 src/intel/tools/i965_batchbuffer_dump_show.c   |  129 +
 .../tools/i965_batchbuffer_dump_show_json.cpp  |  251 ++