Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-21 Thread John Harrison

On 12/21/2017 1:34 AM, Tvrtko Ursulin wrote:


On 20/12/2017 23:50, John Harrison wrote:

On 12/20/2017 1:54 AM, Tvrtko Ursulin wrote:
What was the effect of all this on your big traces? I am only 
testing with a smaller one which goes from ~3.3s to ~2.2s. On a 
larger trace it might be non-linear gains due to double sort 
avoidance, unless there will be some other effects to cancel that out.




So with a trace of a shortened gem_exec_nop/basic_sequential, the 
'perf script' output is 439MB and the original trace.pl before any of 
the changes took ~180s. After the 'auto-detect field order' patch, it 
went up to ~201s. With the optimisation patch it is down to ~129s.


However, I am also seeing differences in the HTML output since the 
optimisation patch. The differences aren't massive, just slight 
variations in the times. The structure is all the same, its just that 
the accounting and/or time stamps are out.


For example:
 {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches331.28us avg batch331.38us avg 
engine batch'},

vs
 {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches338.56us avg batch338.56us avg 
engine batch'},


Or:
 {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 142us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353110', style: 'color: white; background-color: 
red;'},

vs
 {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 159us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353127', style: 'color: white; background-color: 
red;'},


I can send you the full output if it is useful and the source logs 
too. The HTML output is about 840KB but as noted, the perf logs are 
hundreds of MBs.


I was able to reproduce it. I think it's down to floating point 
mishandling. Neither the original nor the optimized version were 
correct and both are accumulating error. I'll send an updated patch 
shortly.


Regards,

Tvrtko


The new version with +0.5 gives me exactly the same results as the 
previous edition without the rounding. If I get rid of the int() 
conversion too, I still get the same results.


What does make a difference is if I fix up the broken processing of 
notify traces:

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 8dbd497b..20ae3c78 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -382,14 +382,16 @@ while (<>) {

    next if exists $tp{'ring'} and exists $ignore_ring{$tp{'ring'}};

-   if (exists $tp{'ring'} and exists $tp{'ctx'} and exists 
$tp{'seqno'}) {

+   if (exists $tp{'ring'} and exists $tp{'seqno'}) {
    $ring = $tp{'ring'};
-   $ctx = $tp{'ctx'};
    $seqno = $tp{'seqno'};

-   $orig_ctx = $ctx;
-   $ctx = sanitize_ctx($ctx, $ring);
-   $key = db_key($ring, $ctx, $seqno);
+   if (exists $tp{'ctx'}) {
+   $ctx = $tp{'ctx'};
+   $orig_ctx = $ctx;
+   $ctx = sanitize_ctx($ctx, $ring);
+   $key = db_key($ring, $ctx, $seqno);
+   }
    }

    if ($tp_name eq 'i915:i915_gem_request_wait_begin:') {
@@ -461,7 +463,7 @@ while (<>) {
    }
    $db{$key}->{'duration'} = $db{$key}->{'notify'} - 
$db{$key}->{'start'};
    $db{$key}->{'context-complete-delay'} = 
$db{$key}->{'end'} - $db{$key}->{'notify'};

-   } elsif ($tp_name eq 'intel_engine_notify:') {
+   } elsif ($tp_name eq 'i915:intel_engine_notify:') {
    $notify{global_key($ring, $seqno)} = $time;
    } elsif ($tp_name eq 'i915:intel_gpu_freq_change:') {
    push @freqs, [$prev_freq_ts, $time, $prev_freq] if 
$prev_freq;


With that, I get exactly the same HTML page as before the optimisation 
patch.


John.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-21 Thread Tvrtko Ursulin


On 20/12/2017 23:50, John Harrison wrote:

On 12/20/2017 1:54 AM, Tvrtko Ursulin wrote:
What was the effect of all this on your big traces? I am only testing 
with a smaller one which goes from ~3.3s to ~2.2s. On a larger trace 
it might be non-linear gains due to double sort avoidance, unless 
there will be some other effects to cancel that out.




So with a trace of a shortened gem_exec_nop/basic_sequential, the 'perf 
script' output is 439MB and the original trace.pl before any of the 
changes took ~180s. After the 'auto-detect field order' patch, it went 
up to ~201s. With the optimisation patch it is down to ~129s.


However, I am also seeing differences in the HTML output since the 
optimisation patch. The differences aren't massive, just slight 
variations in the times. The structure is all the same, its just that 
the accounting and/or time stamps are out.


For example:
     {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches331.28us avg batch331.38us avg engine 
batch'},

vs
     {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches338.56us avg batch338.56us avg engine 
batch'},


Or:
     {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 142us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353110', style: 'color: white; background-color: 
red;'},

vs
     {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 159us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353127', style: 'color: white; background-color: 
red;'},


I can send you the full output if it is useful and the source logs too. 
The HTML output is about 840KB but as noted, the perf logs are hundreds 
of MBs.


I was able to reproduce it. I think it's down to floating point 
mishandling. Neither the original nor the optimized version were correct 
and both are accumulating error. I'll send an updated patch shortly.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-20 Thread John Harrison

On 12/20/2017 1:54 AM, Tvrtko Ursulin wrote:
What was the effect of all this on your big traces? I am only testing 
with a smaller one which goes from ~3.3s to ~2.2s. On a larger trace 
it might be non-linear gains due to double sort avoidance, unless 
there will be some other effects to cancel that out.




So with a trace of a shortened gem_exec_nop/basic_sequential, the 'perf 
script' output is 439MB and the original trace.pl before any of the 
changes took ~180s. After the 'auto-detect field order' patch, it went 
up to ~201s. With the optimisation patch it is down to ~129s.


However, I am also seeing differences in the HTML output since the 
optimisation patch. The differences aren't massive, just slight 
variations in the times. The structure is all the same, its just that 
the accounting and/or time stamps are out.


For example:
    {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches331.28us avg batch331.38us avg engine 
batch'},

vs
    {id: 1, content: 'Ring079.48% idle34.32% 
busy584.97% runnable2103.60% queued16.18% 
wait200931 batches338.56us avg batch338.56us avg engine 
batch'},


Or:
    {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 142us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353110', style: 'color: white; background-color: 
red;'},

vs
    {id: 58, key: -210383407,  type: 'range', group: 4, subgroup: 2, 
subgroupOrder: 3, content: '428/3 0 
??? ++ 159us 
(0us)', start: '2017-01-05 21:27:45.352968', end: 
'2017-01-05 21:27:45.353127', style: 'color: white; background-color: 
red;'},


I can send you the full output if it is useful and the source logs too. 
The HTML output is about 840KB but as noted, the perf logs are hundreds 
of MBs.


John.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-20 Thread Tvrtko Ursulin


On 19/12/2017 16:23, John Harrison wrote:

On 12/19/2017 3:15 AM, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

A couple of small optimizations which altogether bring around 30%
improvement in my testing.

1. Do less string processing on tracepoints names and push more of the
    check into the if-ladder.

2. Pull out common db key and ctx processing and cache common values in
    local vars.

3. Key value pair parsing is faster with a regexp.

4. Avoid sorting the db hash multiple times if possible.

Signed-off-by: Tvrtko Ursulin 
Cc: John Harrison 
---
  scripts/trace.pl | 114 
---

  1 file changed, 49 insertions(+), 65 deletions(-)
---
John please check if this helps with your monster traces before we decide
if it is worth putting this in. Well, there isn't much of a downside
except touching something which works and risking breaking it.

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 98e4a9843a43..5463e1fd1954 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -356,110 +356,91 @@ my $prev_freq = 0;
  my $prev_freq_ts = 0;
  while (<>) {
  my @fields;
-    my @tmp;
  my $tp_name;
-    my $time;
  my %tp;
-    my $key;
+    my ($time, $ctx, $ring, $seqno, $orig_ctx, $key);
  chomp;
  @fields = split ' ';
+    chop $fields[3];
+    $time = int($fields[3] * 100.0);
+
  $tp_name = $fields[4];
-    @tmp = split ':', $tp_name, 2;
-    next unless $tmp[0] eq 'i915';
-    $tp_name = $tmp[1];
-    chop $tp_name;
-    chop $fields[3];
-    $time = $fields[3] * 100.0;
  splice @fields, 0, 5;
  foreach my $f (@fields) {
-    my @kv = split '=|,', $f;
-
-    $kv[0] = 'global' if $kv[0] eq 'global_seqno';
+    my ($k, $v);
-    $tp{$kv[0]} = $kv[1];
+    ($k, $v) = ($f =~ /^(\S+)=(\d+)/);
I recall something about using () selection is slower than using 
pre-defined values. E.g.

   $f =~ m/=/; ($k, $v) = ($`, $');

Or do you specifically need the filtering of only matching numeric values?


It doesn't matter at the moment, all the script touches just happens to 
be numeric so either way works. I did not know of this trick and it is 
indeed marginally faster. So I've sent an update with it included.


What was the effect of all this on your big traces? I am only testing 
with a smaller one which goes from ~3.3s to ~2.2s. On a larger trace it 
might be non-linear gains due to double sort avoidance, unless there 
will be some other effects to cancel that out.



+    next unless defined $k;
+    $k = 'global' if $k eq 'global_seqno';
+    $tp{$k} = $v;
  }
  next if exists $tp{'ring'} and exists $ignore_ring{$tp{'ring'}};
-    if ($tp_name eq 'i915_gem_request_wait_begin') {
-    my %rw;
+    if (exists $tp{'ring'} and exists $tp{'ctx'} and exists 
$tp{'seqno'}) {

+    $ring = $tp{'ring'};
+    $ctx = $tp{'ctx'};
+    $seqno = $tp{'seqno'};
+
+    $orig_ctx = $ctx;
+    $ctx = sanitize_ctx($ctx, $ring);
+    $key = db_key($ring, $ctx, $seqno);
+    }
-    $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-    $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+    if ($tp_name eq 'i915:i915_gem_request_wait_begin:') {
+    my %rw;
  next if exists $reqwait{$key};
  $rw{'key'} = $key;
-    $rw{'ring'} = $tp{'ring'};
-    $rw{'seqno'} = $tp{'seqno'};
-    $rw{'ctx'} = $tp{'ctx'};
+    $rw{'ring'} = $ring;
+    $rw{'seqno'} = $seqno;
+    $rw{'ctx'} = $ctx;
  $rw{'start'} = $time;
  $reqwait{$key} = \%rw;
-    next;
-    } elsif ($tp_name eq 'i915_gem_request_wait_end') {
-    $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-    $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+    } elsif ($tp_name eq 'i915:i915_gem_request_wait_end:') {
  next unless exists $reqwait{$key};
  $reqwait{$key}->{'end'} = $time;
-    next;
-    } elsif ($tp_name eq 'i915_gem_request_add') {
-    my $orig_ctx = $tp{'ctx'};
-
-    $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-    $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+    } elsif ($tp_name eq 'i915:i915_gem_request_add:') {
  if (exists $queue{$key}) {
  $ctxdb{$orig_ctx}++;
-    $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-    $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+    $ctx = sanitize_ctx($ctx, $ring);
This bit originally worked on $orig_ctx not $ctx (i.e., before your 
previous refactoring patch). Which should it be? Will a double 
sanitisation step work as intended? I don't actually hit this line in 
any of the runs I've done.


Yeah well spotted, copy and paste error and fix sent. R-b would be cool 
on it.


This only triggers if you trace something which creates and destroys 
contexts due context id recycling. Here it 

Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-19 Thread John Harrison

On 12/19/2017 3:15 AM, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

A couple of small optimizations which altogether bring around 30%
improvement in my testing.

1. Do less string processing on tracepoints names and push more of the
check into the if-ladder.

2. Pull out common db key and ctx processing and cache common values in
local vars.

3. Key value pair parsing is faster with a regexp.

4. Avoid sorting the db hash multiple times if possible.

Signed-off-by: Tvrtko Ursulin 
Cc: John Harrison 
---
  scripts/trace.pl | 114 ---
  1 file changed, 49 insertions(+), 65 deletions(-)
---
John please check if this helps with your monster traces before we decide
if it is worth putting this in. Well, there isn't much of a downside
except touching something which works and risking breaking it.

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 98e4a9843a43..5463e1fd1954 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -356,110 +356,91 @@ my $prev_freq = 0;
  my $prev_freq_ts = 0;
  while (<>) {
my @fields;
-   my @tmp;
my $tp_name;
-   my $time;
my %tp;
-   my $key;
+   my ($time, $ctx, $ring, $seqno, $orig_ctx, $key);
  
  	chomp;

@fields = split ' ';
  
+	chop $fields[3];

+   $time = int($fields[3] * 100.0);
+
$tp_name = $fields[4];
-   @tmp = split ':', $tp_name, 2;
-   next unless $tmp[0] eq 'i915';
-   $tp_name = $tmp[1];
-   chop $tp_name;
  
-	chop $fields[3];

-   $time = $fields[3] * 100.0;
splice @fields, 0, 5;
  
  	foreach my $f (@fields) {

-   my @kv = split '=|,', $f;
-
-   $kv[0] = 'global' if $kv[0] eq 'global_seqno';
+   my ($k, $v);
  
-		$tp{$kv[0]} = $kv[1];

+   ($k, $v) = ($f =~ /^(\S+)=(\d+)/);
I recall something about using () selection is slower than using 
pre-defined values. E.g.

  $f =~ m/=/; ($k, $v) = ($`, $');

Or do you specifically need the filtering of only matching numeric values?


+   next unless defined $k;
+   $k = 'global' if $k eq 'global_seqno';
+   $tp{$k} = $v;
}
  
  	next if exists $tp{'ring'} and exists $ignore_ring{$tp{'ring'}};
  
-	if ($tp_name eq 'i915_gem_request_wait_begin') {

-   my %rw;
+   if (exists $tp{'ring'} and exists $tp{'ctx'} and exists $tp{'seqno'}) {
+   $ring = $tp{'ring'};
+   $ctx = $tp{'ctx'};
+   $seqno = $tp{'seqno'};
+
+   $orig_ctx = $ctx;
+   $ctx = sanitize_ctx($ctx, $ring);
+   $key = db_key($ring, $ctx, $seqno);
+   }
  
-		$tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});

-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+   if ($tp_name eq 'i915:i915_gem_request_wait_begin:') {
+   my %rw;
  
  		next if exists $reqwait{$key};
  
  		$rw{'key'} = $key;

-   $rw{'ring'} = $tp{'ring'};
-   $rw{'seqno'} = $tp{'seqno'};
-   $rw{'ctx'} = $tp{'ctx'};
+   $rw{'ring'} = $ring;
+   $rw{'seqno'} = $seqno;
+   $rw{'ctx'} = $ctx;
$rw{'start'} = $time;
$reqwait{$key} = \%rw;
-   next;
-   } elsif ($tp_name eq 'i915_gem_request_wait_end') {
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_wait_end:') {
next unless exists $reqwait{$key};
  
  		$reqwait{$key}->{'end'} = $time;

-   next;
-   } elsif ($tp_name eq 'i915_gem_request_add') {
-   my $orig_ctx = $tp{'ctx'};
-
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_add:') {
if (exists $queue{$key}) {
$ctxdb{$orig_ctx}++;
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+   $ctx = sanitize_ctx($ctx, $ring);
This bit originally worked on $orig_ctx not $ctx (i.e., before your 
previous refactoring patch). Which should it be? Will a double 
sanitisation step work as intended? I don't actually hit this line in 
any of the runs I've done.




+   $key = db_key($ring, $ctx, $seqno);
}
  
  		$queue{$key} = $time;

-   next;
-   } elsif ($tp_name eq 'i915_gem_request_submit') {
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_submit:') {
  

Re: [Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-19 Thread Chris Wilson
Quoting Tvrtko Ursulin (2017-12-19 11:15:39)
> From: Tvrtko Ursulin 
> 
> A couple of small optimizations which altogether bring around 30%
> improvement in my testing.
> 
> 1. Do less string processing on tracepoints names and push more of the
>check into the if-ladder.
> 
> 2. Pull out common db key and ctx processing and cache common values in
>local vars.
> 
> 3. Key value pair parsing is faster with a regexp.
> 
> 4. Avoid sorting the db hash multiple times if possible.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: John Harrison 
> ---
>  scripts/trace.pl | 114 
> ---
>  1 file changed, 49 insertions(+), 65 deletions(-)
> ---
> John please check if this helps with your monster traces before we decide
> if it is worth putting this in. Well, there isn't much of a downside
> except touching something which works and risking breaking it.

Could you run the same trace and do something like
time trace1 < input > output1.html
time trace2 < input > output2.html
cmp output1.html output2.html
?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] scripts/trace.pl: Optimize event parsing and processing

2017-12-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

A couple of small optimizations which altogether bring around 30%
improvement in my testing.

1. Do less string processing on tracepoints names and push more of the
   check into the if-ladder.

2. Pull out common db key and ctx processing and cache common values in
   local vars.

3. Key value pair parsing is faster with a regexp.

4. Avoid sorting the db hash multiple times if possible.

Signed-off-by: Tvrtko Ursulin 
Cc: John Harrison 
---
 scripts/trace.pl | 114 ---
 1 file changed, 49 insertions(+), 65 deletions(-)
---
John please check if this helps with your monster traces before we decide
if it is worth putting this in. Well, there isn't much of a downside
except touching something which works and risking breaking it.

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 98e4a9843a43..5463e1fd1954 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -356,110 +356,91 @@ my $prev_freq = 0;
 my $prev_freq_ts = 0;
 while (<>) {
my @fields;
-   my @tmp;
my $tp_name;
-   my $time;
my %tp;
-   my $key;
+   my ($time, $ctx, $ring, $seqno, $orig_ctx, $key);
 
chomp;
@fields = split ' ';
 
+   chop $fields[3];
+   $time = int($fields[3] * 100.0);
+
$tp_name = $fields[4];
-   @tmp = split ':', $tp_name, 2;
-   next unless $tmp[0] eq 'i915';
-   $tp_name = $tmp[1];
-   chop $tp_name;
 
-   chop $fields[3];
-   $time = $fields[3] * 100.0;
splice @fields, 0, 5;
 
foreach my $f (@fields) {
-   my @kv = split '=|,', $f;
-
-   $kv[0] = 'global' if $kv[0] eq 'global_seqno';
+   my ($k, $v);
 
-   $tp{$kv[0]} = $kv[1];
+   ($k, $v) = ($f =~ /^(\S+)=(\d+)/);
+   next unless defined $k;
+   $k = 'global' if $k eq 'global_seqno';
+   $tp{$k} = $v;
}
 
next if exists $tp{'ring'} and exists $ignore_ring{$tp{'ring'}};
 
-   if ($tp_name eq 'i915_gem_request_wait_begin') {
-   my %rw;
+   if (exists $tp{'ring'} and exists $tp{'ctx'} and exists $tp{'seqno'}) {
+   $ring = $tp{'ring'};
+   $ctx = $tp{'ctx'};
+   $seqno = $tp{'seqno'};
+
+   $orig_ctx = $ctx;
+   $ctx = sanitize_ctx($ctx, $ring);
+   $key = db_key($ring, $ctx, $seqno);
+   }
 
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+   if ($tp_name eq 'i915:i915_gem_request_wait_begin:') {
+   my %rw;
 
next if exists $reqwait{$key};
 
$rw{'key'} = $key;
-   $rw{'ring'} = $tp{'ring'};
-   $rw{'seqno'} = $tp{'seqno'};
-   $rw{'ctx'} = $tp{'ctx'};
+   $rw{'ring'} = $ring;
+   $rw{'seqno'} = $seqno;
+   $rw{'ctx'} = $ctx;
$rw{'start'} = $time;
$reqwait{$key} = \%rw;
-   next;
-   } elsif ($tp_name eq 'i915_gem_request_wait_end') {
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_wait_end:') {
next unless exists $reqwait{$key};
 
$reqwait{$key}->{'end'} = $time;
-   next;
-   } elsif ($tp_name eq 'i915_gem_request_add') {
-   my $orig_ctx = $tp{'ctx'};
-
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_add:') {
if (exists $queue{$key}) {
$ctxdb{$orig_ctx}++;
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
+   $ctx = sanitize_ctx($ctx, $ring);
+   $key = db_key($ring, $ctx, $seqno);
}
 
$queue{$key} = $time;
-   next;
-   } elsif ($tp_name eq 'i915_gem_request_submit') {
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'}, $tp{'ctx'}, $tp{'seqno'});
-
+   } elsif ($tp_name eq 'i915:i915_gem_request_submit:') {
die if exists $submit{$key};
die unless exists $queue{$key};
 
$submit{$key} = $time;
-   next;
-   } elsif ($tp_name eq 'i915_gem_request_in') {
+   } elsif ($tp_name eq 'i915:i915_gem_request_in:') {
my %req;
 
-   $tp{'ctx'} = sanitize_ctx($tp{'ctx'}, $tp{'ring'});
-   $key = db_key($tp{'ring'},