Re: [Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

2018-06-07 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-06-07 14:25:28)
> From: Tvrtko Ursulin 
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> v6:
>  * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson)
>  * Change counter scale to avoid multiplication in readout and increase
>counter headroom.
> 
> Signed-off-by: Tvrtko Ursulin 

I can't spot any nits to pick. That means I actually have to review it
now, right?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

2018-06-06 Thread Tvrtko Ursulin


On 06/06/2018 16:23, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-06-06 15:40:10)

From: Tvrtko Ursulin 

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
  * Rebase.
  * Drop floating point constant. (Chris Wilson)

v3:
  * Change scale to 1024 for faster arithmetics. (Chris Wilson)

v4:
  * Refactored for timer period accounting.

v5:
  * Avoid 64-division. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin 
---
  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
  
@@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)

 div_u64((u64)period_ns *
 I915_SAMPLE_QUEUED_DIVISOR,
 100));
+
+   if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
+   
add_sample_mult(>pmu.sample[I915_SAMPLE_RUNNING],
+   last_seqno - current_seqno,
+   div_u64((u64)period_ns *
+   I915_SAMPLE_QUEUED_DIVISOR,
+   100));


Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here


@@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 val = engine->pmu.sample[sample].cur;
  
 if (sample == I915_SAMPLE_QUEUED ||

-   sample == I915_SAMPLE_RUNNABLE)
+   sample == I915_SAMPLE_RUNNABLE ||
+   sample == I915_SAMPLE_RUNNING)
 val = div_u64(val, MSEC_PER_SEC);  /* to qd */


and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);


Yeah that works, thanks.


So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);


Or keep in qd.us as for frequency. I think precision is plenty in any case.


Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)


It is an optimized 64-bit divide, or 64-divide as I faltered in the 
commit message :), so not as bad as 64/64, but still your idea is very good.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

2018-06-06 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-06-06 15:40:10)
> From: Tvrtko Ursulin 
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> v4:
>  * Refactored for timer period accounting.
> 
> v5:
>  * Avoid 64-division. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin 
> ---
>  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
>  
> @@ -226,6 +227,13 @@ engines_sample(struct drm_i915_private *dev_priv, 
> unsigned int period_ns)
> div_u64((u64)period_ns *
> I915_SAMPLE_QUEUED_DIVISOR,
> 100));
> +
> +   if (engine->pmu.enable & BIT(I915_SAMPLE_RUNNING))
> +   
> add_sample_mult(>pmu.sample[I915_SAMPLE_RUNNING],
> +   last_seqno - current_seqno,
> +   div_u64((u64)period_ns *
> +   I915_SAMPLE_QUEUED_DIVISOR,
> +   100));

Are we worried about losing precision with qd.ns?

add_sample_mult(SAMPLE, x, period_ns); here

> @@ -560,7 +569,8 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> val = engine->pmu.sample[sample].cur;
>  
> if (sample == I915_SAMPLE_QUEUED ||
> -   sample == I915_SAMPLE_RUNNABLE)
> +   sample == I915_SAMPLE_RUNNABLE ||
> +   sample == I915_SAMPLE_RUNNING)
> val = div_u64(val, MSEC_PER_SEC);  /* to qd */

and val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR, NSEC_PER_SEC);

So that gives us a limit of ~1 million qd (assuming the user cares for
about 1s intervals). Up to 8 million wlog with

val = div_u64(val * I915_SAMPLE_QUEUED_DIVISOR/8, NSEC_PER_SEC/8);

Anyway, just concerned to have more than one 64b division and want to
provoke you into thinking of a way of avoiding it :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

2018-04-09 Thread Tvrtko Ursulin


On 06/04/2018 21:24, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-04-05 13:39:22)

From: Tvrtko Ursulin 

We add a PMU counter to expose the number of requests currently executing
on the GPU.

This is useful to analyze the overall load of the system.

v2:
  * Rebase.
  * Drop floating point constant. (Chris Wilson)

v3:
  * Change scale to 1024 for faster arithmetics. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Chris Wilson 

Do we want these separate in the final push? Is there value in reverting
one but not the others? They seem a triumvirate.


I think the only benefit to have them separate for me was that rebasing 
was marginally easier. I can just as well squash them if that is preferred.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/pmu: Add running counter

2018-04-06 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-04-05 13:39:22)
> From: Tvrtko Ursulin 
> 
> We add a PMU counter to expose the number of requests currently executing
> on the GPU.
> 
> This is useful to analyze the overall load of the system.
> 
> v2:
>  * Rebase.
>  * Drop floating point constant. (Chris Wilson)
> 
> v3:
>  * Change scale to 1024 for faster arithmetics. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin 

Reviewed-by: Chris Wilson 

Do we want these separate in the final push? Is there value in reverting
one but not the others? They seem a triumvirate.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx