Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On 15/05/2023 23:07, Dixit, Ashutosh wrote: On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote: Hi Tvrtko, On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote: On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: On 12/05/2023 02:08, Dixit, Ashutosh wrote: On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa 8< +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? making pmu->enable u64 as well as other places where it is assigned to local variables. Hmm.. yes very ugly. Could have been an accident which happened to work because there is a single timer (not per tile). Happened to work because the frequency mask does not spill over to the upper 32 bits (even for multi tile). - START_SECTION Similar issue in frequency_sampling_enabled too. Gt_id argument to it seems pointless. Not sure why it's pointless. We need the gt_id to determine the right mask for that specific gt. If it's not enabled, then we just return without pm_get and async put (like you mention later). And this piece of code is called within for_each_gt. I think I got a little confused cross referencing the code and patches last week and did not mentally see all the changes. Because the hunk in other_bit() is correctly adding support for per gt bits. The layout of pmu->enable ends up like this: bits 0 - 2: engine events bits 3 - 5: gt0 other events bits 6 - 8: gt1 other events bits 9 - 11: gt2 other events bits 12 - 14: gt3 other events Correct. So I now think whole frequency_enabled_mask() is just pointless and should be removed. And then pmu_needs_time code can stay as is. Possibly add a config_mask_32 helper which ensures no bits in upper 32 bits are returned. That is if we are happy for the frequency_sampling_enabled returning true for all gts, regardless of which ones actually have frequency sampling enabled. Or if we want to implement it as I probably have intended, we will need to add some gt bits into pmu->enable. Maybe reserve top four same as with config counters. Nope. What you have here works just fine. pmu->enable should not include any gt id info. gt_id[63:60] is only a concept for pmu config sent by user. config_mask and pmu->enable are i915 internal bookkeeping (bit masks) just to track what events need to be sampled. The 'other' bit masks are a function of gt_id because we use gt_id to calculate a contiguous numerical value for these 'other' events. That's all. Once the numerical value is calculated, there is no need for gt_id because config_mask is BIT_ULL(numerical_value). Since the numerical values never exceeded 31 (even for multi-gts), everything worked even with 32 bit pmu->enable. Yep. So question then is why make pmu->enable u64? The only reason was simplicity, since a lot of the existing code already assumes u64. E.g. if we keep pmu->enable u32, we should have to do the following: * Change config_mask() return type to u32 (in frequency_sampling_enabled(), we have 'pmu->enable & config_mask()') * Change frequency_enabled_mask() return type to u32 (again uses config_mask() so if we change config_mask() to u32 we change return type here too) * In i915_pmu_enable(), change 'pmu->enable |= BIT_ULL(bit)' to 'pmu->enable |= BIT(bit)' So yes, if we think we should be using pmu->enable u32, let's change this to be consistent everywhere. Instead frequency_enabled_mask() should be made u32 since the bitwise or composition of config_masks() is guaranteed to fit. At most it can have an internal u64 for the mask, assert upper_32_bits are zero and return lower_32_bits. Did I get it right this time round? :) Yes, though we'd have to make the config_mas
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Mon, 15 May 2023 03:10:56 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote: > > On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: > >> > >> On 12/05/2023 02:08, Dixit, Ashutosh wrote: > >>> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: > > From: Tvrtko Ursulin > > Reserve some bits in the counter config namespace which will carry the > tile id and prepare the code to handle this. > > No per tile counters have been added yet. > > v2: > - Fix checkpatch issues > - Use 4 bits for gt id in non-engine counters. Drop FIXME. > - Set MAX GTs to 4. Drop FIXME. > > Signed-off-by: Tvrtko Ursulin > Signed-off-by: Umesh Nerlige Ramappa > > 8< > > +static u64 frequency_enabled_mask(void) > +{ > + unsigned int i; > + u64 mask = 0; > + > + for (i = 0; i < I915_PMU_MAX_GTS; i++) > + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | > + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); > + > + return mask; > +} > + > static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) > { > struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), > pmu); > @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, > bool gpu_active) > * Mask out all the ones which do not need the timer, or in > * other words keep all the ones that could need the timer. > */ > - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | > - config_mask(I915_PMU_REQUESTED_FREQUENCY) | > - ENGINE_SAMPLE_MASK; > + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; > >>> > >>> u32 enable & u64 frequency_enabled_mask > >>> > >>> ugly but ok I guess? Or change enable to u64? > > > > making pmu->enable u64 as well as other places where it is assigned to > > local variables. > > > >> > >> Hmm.. yes very ugly. Could have been an accident which happened to work > >> because there is a single timer (not per tile). > > > > Happened to work because the frequency mask does not spill over to the > > upper 32 bits (even for multi tile). > > > > - START_SECTION > >> > >> Similar issue in frequency_sampling_enabled too. Gt_id argument to it > >> seems pointless. > > > > Not sure why it's pointless. We need the gt_id to determine the right > > mask for that specific gt. If it's not enabled, then we just return > > without pm_get and async put (like you mention later). > > And this piece of code is called within for_each_gt. > > I think I got a little confused cross referencing the code and patches last > week and did not mentally see all the changes. > > Because the hunk in other_bit() is correctly adding support for per gt bits. > > The layout of pmu->enable ends up like this: > > bits 0 - 2: engine events > bits 3 - 5: gt0 other events > bits 6 - 8: gt1 other events > bits 9 - 11: gt2 other events > bits 12 - 14: gt3 other events Correct. > > >> So I now think whole frequency_enabled_mask() is just pointless and > >> should be removed. And then pmu_needs_time code can stay as is. Possibly > >> add a config_mask_32 helper which ensures no bits in upper 32 bits are > >> returned. > >> > >> That is if we are happy for the frequency_sampling_enabled returning > >> true for all gts, regardless of which ones actually have frequency > >> sampling enabled. > >> > >> Or if we want to implement it as I probably have intended, we will need > >> to add some gt bits into pmu->enable. Maybe reserve top four same as > >> with config counters. > > > > Nope. What you have here works just fine. pmu->enable should not include > > any gt id info. gt_id[63:60] is only a concept for pmu config sent by > > user. config_mask and pmu->enable are i915 internal bookkeeping (bit > > masks) just to track what events need to be sampled. The 'other' bit > > masks are a function of gt_id because we use gt_id to calculate a > > contiguous numerical value for these 'other' events. That's all. Once the > > numerical value is calculated, there is no need for gt_id because > > config_mask is BIT_ULL(numerical_value). Since the numerical values never > > exceeded 31 (even for multi-gts), everything worked even with 32 bit > > pmu->enable. > > Yep. > > So question then is why make pmu->enable u64? The only reason was simplicity, since a lot of the existing code already assumes u64. E.g. if we keep pmu->enable u32, we should have to do the following: * Change config_mask() return type to u32 (in frequency_sampling_enabled(), we have 'pmu->enable & config_mask()') * Change frequency_enabled_mask() return type to u32 (again uses config_mask() so if we change config_mask() to u32 we change return type here too) * In i915_pmu_enable(), change 'pmu->enable |= BIT_ULL(bit)' to
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Mon, May 15, 2023 at 11:12:33AM +0100, Tvrtko Ursulin wrote: On 15/05/2023 07:44, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_pmu.c | 148 +++- drivers/gpu/drm/i915/i915_pmu.h | 11 ++- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 725b01b00775..b3dd9e51c5cc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); - u32 enable; + u64 enable; /* * Only some counters need the sampling timer. @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +231,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +240,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value. */ - val = ktime_since_raw(pmu->sleep_last); - val += pmu->sample[__I915_SAMPLE_RC6].cur; +
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On 15/05/2023 07:44, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_pmu.c | 148 +++- drivers/gpu/drm/i915/i915_pmu.h | 11 ++- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 725b01b00775..b3dd9e51c5cc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); - u32 enable; + u64 enable; /* * Only some counters need the sampling timer. @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +231,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +240,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value. */ - val = ktime_since_raw(pmu->sleep_last); - val += pmu->sample[__I915_SAMPLE_RC6].cur; + val = ktime_since_raw(pmu->slee
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On 12/05/2023 21:57, Umesh Nerlige Ramappa wrote: On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: On 12/05/2023 02:08, Dixit, Ashutosh wrote: On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa 8< +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? making pmu->enable u64 as well as other places where it is assigned to local variables. Hmm.. yes very ugly. Could have been an accident which happened to work because there is a single timer (not per tile). Happened to work because the frequency mask does not spill over to the upper 32 bits (even for multi tile). - START_SECTION Similar issue in frequency_sampling_enabled too. Gt_id argument to it seems pointless. Not sure why it's pointless. We need the gt_id to determine the right mask for that specific gt. If it's not enabled, then we just return without pm_get and async put (like you mention later). And this piece of code is called within for_each_gt. I think I got a little confused cross referencing the code and patches last week and did not mentally see all the changes. Because the hunk in other_bit() is correctly adding support for per gt bits. The layout of pmu->enable ends up like this: bits 0 - 2: engine events bits 3 - 5: gt0 other events bits 6 - 8: gt1 other events bits 9 - 11: gt2 other events bits 12 - 14: gt3 other events So I now think whole frequency_enabled_mask() is just pointless and should be removed. And then pmu_needs_time code can stay as is. Possibly add a config_mask_32 helper which ensures no bits in upper 32 bits are returned. That is if we are happy for the frequency_sampling_enabled returning true for all gts, regardless of which ones actually have frequency sampling enabled. Or if we want to implement it as I probably have intended, we will need to add some gt bits into pmu->enable. Maybe reserve top four same as with config counters. Nope. What you have here works just fine. pmu->enable should not include any gt id info. gt_id[63:60] is only a concept for pmu config sent by user. config_mask and pmu->enable are i915 internal bookkeeping (bit masks) just to track what events need to be sampled. The 'other' bit masks are a function of gt_id because we use gt_id to calculate a contiguous numerical value for these 'other' events. That's all. Once the numerical value is calculated, there is no need for gt_id because config_mask is BIT_ULL(numerical_value). Since the numerical values never exceeded 31 (even for multi-gts), everything worked even with 32 bit pmu->enable. Yep. So question then is why make pmu->enable u64? Instead frequency_enabled_mask() should be made u32 since the bitwise or composition of config_masks() is guaranteed to fit. At most it can have an internal u64 for the mask, assert upper_32_bits are zero and return lower_32_bits. Did I get it right this time round? :) Regards, Tvrtko
[Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_pmu.c | 148 +++- drivers/gpu/drm/i915/i915_pmu.h | 11 ++- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 725b01b00775..b3dd9e51c5cc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); - u32 enable; + u64 enable; /* * Only some counters need the sampling timer. @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +231,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +240,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value. */ - val = ktime_since_raw(pmu->sleep_last); - val += pmu->sample[__I915_SAMPLE_RC6].cur; + val = ktime_since_raw(pmu->sleep_last[gt_id]); + val += read_sample(pmu, gt_i
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, May 12, 2023 at 09:41:56PM -0700, Dixit, Ashutosh wrote: On Fri, 12 May 2023 18:55:44 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch Just a reminder in case you want to do something like: #define I915_PMU_MAX_GTS I915_MAX_GT Or replace I915_PMU_MAX_GTS by I915_MAX_GT. Hmmm, I thought I sent out a response separately for that in the previous series, but I am not able to locate it, strange. Anyways, I did try that and ran into issues that Tvrtko was mentioning w.r.t. header dependencies. I think i915_drv.h includes intel_engine.h and that includes i915_pmu.h. So including i915_drv.h in i915_pmu.h for the definition of I915_MAX_GT is just wreaking havoc during compile. Hence, gave up on that and using whatever existed before in Tvrtko's patch. Thanks, Umesh But otherwise v3 LGTM: Reviewed-by: Ashutosh Dixit Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 148 +++- drivers/gpu/drm/i915/i915_pmu.h | 11 ++- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 725b01b00775..b3dd9e51c5cc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); - u32 enable; + u64 enable; /* * Only some counters need the sampling timer. @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, 12 May 2023 18:55:44 -0700, Umesh Nerlige Ramappa wrote: > > From: Tvrtko Ursulin > > Reserve some bits in the counter config namespace which will carry the > tile id and prepare the code to handle this. > > No per tile counters have been added yet. > > v2: > - Fix checkpatch issues > - Use 4 bits for gt id in non-engine counters. Drop FIXME. > - Set MAX GTs to 4. Drop FIXME. > > v3: (Ashutosh, Tvrtko) > - Drop BUG_ON that would never fire > - Make enable u64 > - Pull in some code from next patch Just a reminder in case you want to do something like: #define I915_PMU_MAX_GTS I915_MAX_GT Or replace I915_PMU_MAX_GTS by I915_MAX_GT. But otherwise v3 LGTM: Reviewed-by: Ashutosh Dixit > Signed-off-by: Tvrtko Ursulin > Signed-off-by: Umesh Nerlige Ramappa > --- > drivers/gpu/drm/i915/i915_pmu.c | 148 +++- > drivers/gpu/drm/i915/i915_pmu.h | 11 ++- > include/uapi/drm/i915_drm.h | 17 +++- > 3 files changed, 129 insertions(+), 47 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > index 725b01b00775..b3dd9e51c5cc 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.c > +++ b/drivers/gpu/drm/i915/i915_pmu.c > @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) > return config < __I915_PMU_OTHER(0); > } > > +static unsigned int config_gt_id(const u64 config) > +{ > + return config >> __I915_PMU_GT_SHIFT; > +} > + > +static u64 config_counter(const u64 config) > +{ > + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); > +} > + > static unsigned int other_bit(const u64 config) > { > unsigned int val; > > - switch (config) { > + switch (config_counter(config)) { > case I915_PMU_ACTUAL_FREQUENCY: > val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; > break; > @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) > return -1; > } > > - return I915_ENGINE_SAMPLE_COUNT + val; > + return I915_ENGINE_SAMPLE_COUNT + > +config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + > +val; > } > > static unsigned int config_bit(const u64 config) > @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) > return config_bit(event->attr.config); > } > > +static u64 frequency_enabled_mask(void) > +{ > + unsigned int i; > + u64 mask = 0; > + > + for (i = 0; i < I915_PMU_MAX_GTS; i++) > + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | > + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); > + > + return mask; > +} > + > static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) > { > struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); > - u32 enable; > + u64 enable; > > /* >* Only some counters need the sampling timer. > @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool > gpu_active) >* Mask out all the ones which do not need the timer, or in >* other words keep all the ones that could need the timer. >*/ > - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | > - config_mask(I915_PMU_REQUESTED_FREQUENCY) | > - ENGINE_SAMPLE_MASK; > + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; > > /* >* When the GPU is idle per-engine counters do not need to be > @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) > return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); > } > > +static unsigned int > +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) > +{ > + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; > + > + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); > + > + return idx; > +} > + > +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) > +{ > + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; > +} > + > +static void > +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) > +{ > + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; > +} > + > +static void > +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 > val, u32 mul) > +{ > + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, > mul); > +} > + > static u64 get_rc6(struct intel_gt *gt) > { > struct drm_i915_private *i915 = gt->i915; > + const unsigned int gt_id = gt->info.id; > struct i915_pmu *pmu = &i915->pmu; > unsigned long flags; > bool awake = false; > @@ -181,7 +231,7 @@ static u64 get_rc6(struct intel_gt *gt) > spin_lock_irqsave(&pmu->lock, flags); > > if (awake) { > - pmu->sample[__I915_SAMPLE_RC6].cur = val; > + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); > } else { > /* >* We think we are runtime suspended. > @@ -190,14 +240,14 @@ st
[Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 148 +++- drivers/gpu/drm/i915/i915_pmu.h | 11 ++- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 725b01b00775..b3dd9e51c5cc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,7 +88,9 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) @@ -104,10 +116,22 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); - u32 enable; + u64 enable; /* * Only some counters need the sampling timer. @@ -120,9 +144,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +186,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +231,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +240,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value. */ - val = ktime_since_raw(pmu->sleep_last); - val += pmu->sample[__I915_SAMPLE_RC6].cur; + val = ktime_since_raw(pmu->sleep_last[gt_id]); + val += read_sample(pmu, gt_id, __I915_SAMPLE_RC6);
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, 12 May 2023 13:57:59 -0700, Umesh Nerlige Ramappa wrote: > > On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: > > > > On 12/05/2023 02:08, Dixit, Ashutosh wrote: > >> On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: > >>> > >>> From: Tvrtko Ursulin > >>> > >>> Reserve some bits in the counter config namespace which will carry the > >>> tile id and prepare the code to handle this. > >>> > >>> No per tile counters have been added yet. > >>> > >>> v2: > >>> - Fix checkpatch issues > >>> - Use 4 bits for gt id in non-engine counters. Drop FIXME. > >>> - Set MAX GTs to 4. Drop FIXME. > >>> > >>> Signed-off-by: Tvrtko Ursulin > >>> Signed-off-by: Umesh Nerlige Ramappa > >>> --- > >>> drivers/gpu/drm/i915/i915_pmu.c | 150 +++- > >>> drivers/gpu/drm/i915/i915_pmu.h | 9 +- > >>> include/uapi/drm/i915_drm.h | 17 +++- > >>> 3 files changed, 129 insertions(+), 47 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c > >>> b/drivers/gpu/drm/i915/i915_pmu.c > >>> index 669a42e44082..12b2f3169abf 100644 > >>> --- a/drivers/gpu/drm/i915/i915_pmu.c > >>> +++ b/drivers/gpu/drm/i915/i915_pmu.c > >>> @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) > >>> return config < __I915_PMU_OTHER(0); > >>> } > >>> > >>> +static unsigned int config_gt_id(const u64 config) > >>> +{ > >>> + return config >> __I915_PMU_GT_SHIFT; > >>> +} > >>> + > >>> +static u64 config_counter(const u64 config) > >>> +{ > >>> + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); > >> > >> ok, but another possibility: > >> > >>return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT); > > > > It's not a register so no. :) GENMASK_ULL maybe but meh. > > leaving as is. > > > > >>> +} > >>> + > >>> static unsigned int other_bit(const u64 config) > >>> { > >>> unsigned int val; > >>> > >>> - switch (config) { > >>> + switch (config_counter(config)) { > >>> case I915_PMU_ACTUAL_FREQUENCY: > >>> val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; > >>> break; > >>> @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) > >>> return -1; > >>> } > >>> > >>> - return I915_ENGINE_SAMPLE_COUNT + val; > >>> + return I915_ENGINE_SAMPLE_COUNT + > >>> +config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + > >>> +val; > >>> } > >>> > >>> static unsigned int config_bit(const u64 config) > >>> { > >>> - if (is_engine_config(config)) > >>> + if (is_engine_config(config)) { > >>> + GEM_BUG_ON(config_gt_id(config)); > >> > >> This GEM_BUG_ON is not needed since: > >> > >>static bool is_engine_config(u64 config) > >>{ > >>return config < __I915_PMU_OTHER(0); > >>} > > > > True! > > dropping BUG_ON > > > > >>> + > >>> return engine_config_sample(config); > >>> - else > >>> + } else { > >>> return other_bit(config); > >>> + } > >>> } > >>> > >>> static u64 config_mask(u64 config) > >>> @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event > >>> *event) > >>> return config_bit(event->attr.config); > >>> } > >>> > >>> +static u64 frequency_enabled_mask(void) > >>> +{ > >>> + unsigned int i; > >>> + u64 mask = 0; > >>> + > >>> + for (i = 0; i < I915_PMU_MAX_GTS; i++) > >>> + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | > >>> + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); > >>> + > >>> + return mask; > >>> +} > >>> + > >>> static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) > >>> { > >>> struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); > >>> @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, > >>> bool gpu_active) > >>>* Mask out all the ones which do not need the timer, or in > >>>* other words keep all the ones that could need the timer. > >>>*/ > >>> - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | > >>> - config_mask(I915_PMU_REQUESTED_FREQUENCY) | > >>> - ENGINE_SAMPLE_MASK; > >>> + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; > >> > >> u32 enable & u64 frequency_enabled_mask > >> > >> ugly but ok I guess? Or change enable to u64? > > making pmu->enable u64 as well as other places where it is assigned to > local variables. Yes, that's the way to do it. > > > > > Hmm.. yes very ugly. Could have been an accident which happened to work > > because there is a single timer (not per tile). > > Happened to work because the frequency mask does not spill over to the > upper 32 bits (even for multi tile). Even with 4 tiles, I checked. > > - START_SECTION > > > > Similar issue in frequency_sampling_enabled too. Gt_id argument to it > > seems pointless. > > Not sure why it's pointless. We need the gt_id to determine the right mask > for that specific gt. If it's not enabled, then we just return without > pm_get and async put (like you mention later). > And this pi
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, May 12, 2023 at 01:57:59PM -0700, Umesh Nerlige Ramappa wrote: On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: On 12/05/2023 02:08, Dixit, Ashutosh wrote: On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 150 +++- drivers/gpu/drm/i915/i915_pmu.h | 9 +- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 669a42e44082..12b2f3169abf 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); ok, but another possibility: return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT); It's not a register so no. :) GENMASK_ULL maybe but meh. leaving as is. +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) { - if (is_engine_config(config)) + if (is_engine_config(config)) { + GEM_BUG_ON(config_gt_id(config)); This GEM_BUG_ON is not needed since: static bool is_engine_config(u64 config) { return config < __I915_PMU_OTHER(0); } True! dropping BUG_ON + return engine_config_sample(config); - else + } else { return other_bit(config); + } } static u64 config_mask(u64 config) @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? making pmu->enable u64 as well as other places where it is assigned to local variables. Hmm.. yes very ugly. Could have been an accident which happened to work because there is a single timer (not per tile). Happened to work because the frequency mask does not spill over to the upper 32 bits (even for multi tile). - START_SECTION Similar issue in frequency_sampling_enabled too. Gt_id argument to it seems pointless. Not sure why it's pointless. We need the gt_id to determine the right mask for that specific gt. If it's not enabled, then we just return without pm_get and async put (like you mention later). And this piece of code is called within for_each_gt. So I now think whole frequency_enabled_mask() is just pointless and should be removed. And then pmu_needs_time code can stay as is. Possibly add a config_mask_32 helper which ensures no bits in upper 32 bits are returned. That is if we are happy for the frequency_sampling_enabled returning true for all gts, regardless of which ones actually have frequency sampling enabled. Or if we want to implement it as I probably have intended, we will need to add some gt bits into pmu->enable. Maybe re
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, May 12, 2023 at 11:56:18AM +0100, Tvrtko Ursulin wrote: On 12/05/2023 02:08, Dixit, Ashutosh wrote: On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 150 +++- drivers/gpu/drm/i915/i915_pmu.h | 9 +- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 669a42e44082..12b2f3169abf 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); ok, but another possibility: return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT); It's not a register so no. :) GENMASK_ULL maybe but meh. leaving as is. +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) { - if (is_engine_config(config)) + if (is_engine_config(config)) { + GEM_BUG_ON(config_gt_id(config)); This GEM_BUG_ON is not needed since: static bool is_engine_config(u64 config) { return config < __I915_PMU_OTHER(0); } True! dropping BUG_ON + return engine_config_sample(config); - else + } else { return other_bit(config); + } } static u64 config_mask(u64 config) @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? making pmu->enable u64 as well as other places where it is assigned to local variables. Hmm.. yes very ugly. Could have been an accident which happened to work because there is a single timer (not per tile). Happened to work because the frequency mask does not spill over to the upper 32 bits (even for multi tile). - START_SECTION Similar issue in frequency_sampling_enabled too. Gt_id argument to it seems pointless. Not sure why it's pointless. We need the gt_id to determine the right mask for that specific gt. If it's not enabled, then we just return without pm_get and async put (like you mention later). And this piece of code is called within for_each_gt. So I now think whole frequency_enabled_mask() is just pointless and should be removed. And then pmu_needs_time code can stay as is. Possibly add a config_mask_32 helper which ensures no bits in upper 32 bits are returned. That is if we are happy for the frequency_sampling_enabled returning true for all gts, regardless of which ones actually have frequency sampling enabled. Or if we want to implement it as I probably have intended, we will need to add some gt bits into pmu->enable. Maybe reserve top four same as with config counters. Nope. Wh
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On 12/05/2023 02:08, Dixit, Ashutosh wrote: On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 150 +++- drivers/gpu/drm/i915/i915_pmu.h | 9 +- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 669a42e44082..12b2f3169abf 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); ok, but another possibility: return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT); It's not a register so no. :) GENMASK_ULL maybe but meh. +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) { - if (is_engine_config(config)) + if (is_engine_config(config)) { + GEM_BUG_ON(config_gt_id(config)); This GEM_BUG_ON is not needed since: static bool is_engine_config(u64 config) { return config < __I915_PMU_OTHER(0); } True! + return engine_config_sample(config); - else + } else { return other_bit(config); + } } static u64 config_mask(u64 config) @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? Hmm.. yes very ugly. Could have been an accident which happened to work because there is a single timer (not per tile). Similar issue in frequency_sampling_enabled too. Gt_id argument to it seems pointless. So I now think whole frequency_enabled_mask() is just pointless and should be removed. And then pmu_needs_time code can stay as is. Possibly add a config_mask_32 helper which ensures no bits in upper 32 bits are returned. That is if we are happy for the frequency_sampling_enabled returning true for all gts, regardless of which ones actually have frequency sampling enabled. Or if we want to implement it as I probably have intended, we will need to add some gt bits into pmu->enable. Maybe reserve top four same as with config counters. In this case the config_mask needs to be updated to translate not just the config counter into the pmu tracked event bits, but config counter gt id into the pmu->enabled gt id. Sounds easily doable on a first thought. /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; +
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, 05 May 2023 17:58:15 -0700, Umesh Nerlige Ramappa wrote: > > From: Tvrtko Ursulin > > Reserve some bits in the counter config namespace which will carry the > tile id and prepare the code to handle this. > > No per tile counters have been added yet. > > v2: > - Fix checkpatch issues > - Use 4 bits for gt id in non-engine counters. Drop FIXME. > - Set MAX GTs to 4. Drop FIXME. > > Signed-off-by: Tvrtko Ursulin > Signed-off-by: Umesh Nerlige Ramappa > --- > drivers/gpu/drm/i915/i915_pmu.c | 150 +++- > drivers/gpu/drm/i915/i915_pmu.h | 9 +- > include/uapi/drm/i915_drm.h | 17 +++- > 3 files changed, 129 insertions(+), 47 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c > index 669a42e44082..12b2f3169abf 100644 > --- a/drivers/gpu/drm/i915/i915_pmu.c > +++ b/drivers/gpu/drm/i915/i915_pmu.c > @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) > return config < __I915_PMU_OTHER(0); > } > > +static unsigned int config_gt_id(const u64 config) > +{ > + return config >> __I915_PMU_GT_SHIFT; > +} > + > +static u64 config_counter(const u64 config) > +{ > + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); ok, but another possibility: return config & ~REG_GENMASK64(63, __I915_PMU_GT_SHIFT); > +} > + > static unsigned int other_bit(const u64 config) > { > unsigned int val; > > - switch (config) { > + switch (config_counter(config)) { > case I915_PMU_ACTUAL_FREQUENCY: > val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; > break; > @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) > return -1; > } > > - return I915_ENGINE_SAMPLE_COUNT + val; > + return I915_ENGINE_SAMPLE_COUNT + > +config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + > +val; > } > > static unsigned int config_bit(const u64 config) > { > - if (is_engine_config(config)) > + if (is_engine_config(config)) { > + GEM_BUG_ON(config_gt_id(config)); This GEM_BUG_ON is not needed since: static bool is_engine_config(u64 config) { return config < __I915_PMU_OTHER(0); } > + > return engine_config_sample(config); > - else > + } else { > return other_bit(config); > + } > } > > static u64 config_mask(u64 config) > @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) > return config_bit(event->attr.config); > } > > +static u64 frequency_enabled_mask(void) > +{ > + unsigned int i; > + u64 mask = 0; > + > + for (i = 0; i < I915_PMU_MAX_GTS; i++) > + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | > + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); > + > + return mask; > +} > + > static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) > { > struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); > @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool > gpu_active) >* Mask out all the ones which do not need the timer, or in >* other words keep all the ones that could need the timer. >*/ > - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | > - config_mask(I915_PMU_REQUESTED_FREQUENCY) | > - ENGINE_SAMPLE_MASK; > + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; u32 enable & u64 frequency_enabled_mask ugly but ok I guess? Or change enable to u64? > > /* >* When the GPU is idle per-engine counters do not need to be > @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) > return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); > } > > +static unsigned int > +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) > +{ > + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; > + > + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); Does this GEM_BUG_ON need to be split up as follows: GEM_BUG_ON(gt_id >= I915_PMU_MAX_GTS); GEM_BUG_ON(sample >= __I915_NUM_PMU_SAMPLERS); Since that is what we really mean here isn't it? > + > + return idx; > +} > + > +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) > +{ > + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; > +} > + > +static void > +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) > +{ > + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; > +} > + > +static void > +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 > val, u32 mul) > +{ > + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, > mul); > +} Gripe: I think this code should have per event data structures which store all information about a particular event. Rather than storing it in these arrays common to all
Re: [Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
On Fri, May 05, 2023 at 05:58:15PM -0700, Umesh Nerlige Ramappa wrote: From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. I touched this one, so cannot review it. Also a reminder to myself to add the UMD changes (intel_gpu_top) link here. Umesh Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 150 +++- drivers/gpu/drm/i915/i915_pmu.h | 9 +- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 669a42e44082..12b2f3169abf 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) { - if (is_engine_config(config)) + if (is_engine_config(config)) { + GEM_BUG_ON(config_gt_id(config)); + return engine_config_sample(config); - else + } else { return other_bit(config); + } } static u64 config_mask(u64 config) @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value.
[Intel-gfx] [PATCH 5/6] drm/i915/pmu: Prepare for multi-tile non-engine counters
From: Tvrtko Ursulin Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. Signed-off-by: Tvrtko Ursulin Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_pmu.c | 150 +++- drivers/gpu/drm/i915/i915_pmu.h | 9 +- include/uapi/drm/i915_drm.h | 17 +++- 3 files changed, 129 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 669a42e44082..12b2f3169abf 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -56,11 +56,21 @@ static bool is_engine_config(u64 config) return config < __I915_PMU_OTHER(0); } +static unsigned int config_gt_id(const u64 config) +{ + return config >> __I915_PMU_GT_SHIFT; +} + +static u64 config_counter(const u64 config) +{ + return config & ~(~0ULL << __I915_PMU_GT_SHIFT); +} + static unsigned int other_bit(const u64 config) { unsigned int val; - switch (config) { + switch (config_counter(config)) { case I915_PMU_ACTUAL_FREQUENCY: val = __I915_PMU_ACTUAL_FREQUENCY_ENABLED; break; @@ -78,15 +88,20 @@ static unsigned int other_bit(const u64 config) return -1; } - return I915_ENGINE_SAMPLE_COUNT + val; + return I915_ENGINE_SAMPLE_COUNT + + config_gt_id(config) * __I915_PMU_TRACKED_EVENT_COUNT + + val; } static unsigned int config_bit(const u64 config) { - if (is_engine_config(config)) + if (is_engine_config(config)) { + GEM_BUG_ON(config_gt_id(config)); + return engine_config_sample(config); - else + } else { return other_bit(config); + } } static u64 config_mask(u64 config) @@ -104,6 +119,18 @@ static unsigned int event_bit(struct perf_event *event) return config_bit(event->attr.config); } +static u64 frequency_enabled_mask(void) +{ + unsigned int i; + u64 mask = 0; + + for (i = 0; i < I915_PMU_MAX_GTS; i++) + mask |= config_mask(__I915_PMU_ACTUAL_FREQUENCY(i)) | + config_mask(__I915_PMU_REQUESTED_FREQUENCY(i)); + + return mask; +} + static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) { struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu); @@ -120,9 +147,7 @@ static bool pmu_needs_timer(struct i915_pmu *pmu, bool gpu_active) * Mask out all the ones which do not need the timer, or in * other words keep all the ones that could need the timer. */ - enable &= config_mask(I915_PMU_ACTUAL_FREQUENCY) | - config_mask(I915_PMU_REQUESTED_FREQUENCY) | - ENGINE_SAMPLE_MASK; + enable &= frequency_enabled_mask() | ENGINE_SAMPLE_MASK; /* * When the GPU is idle per-engine counters do not need to be @@ -164,9 +189,37 @@ static inline s64 ktime_since_raw(const ktime_t kt) return ktime_to_ns(ktime_sub(ktime_get_raw(), kt)); } +static unsigned int +__sample_idx(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + unsigned int idx = gt_id * __I915_NUM_PMU_SAMPLERS + sample; + + GEM_BUG_ON(idx >= ARRAY_SIZE(pmu->sample)); + + return idx; +} + +static u64 read_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample) +{ + return pmu->sample[__sample_idx(pmu, gt_id, sample)].cur; +} + +static void +store_sample(struct i915_pmu *pmu, unsigned int gt_id, int sample, u64 val) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur = val; +} + +static void +add_sample_mult(struct i915_pmu *pmu, unsigned int gt_id, int sample, u32 val, u32 mul) +{ + pmu->sample[__sample_idx(pmu, gt_id, sample)].cur += mul_u32_u32(val, mul); +} + static u64 get_rc6(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; + const unsigned int gt_id = gt->info.id; struct i915_pmu *pmu = &i915->pmu; unsigned long flags; bool awake = false; @@ -181,7 +234,7 @@ static u64 get_rc6(struct intel_gt *gt) spin_lock_irqsave(&pmu->lock, flags); if (awake) { - pmu->sample[__I915_SAMPLE_RC6].cur = val; + store_sample(pmu, gt_id, __I915_SAMPLE_RC6, val); } else { /* * We think we are runtime suspended. @@ -190,14 +243,14 @@ static u64 get_rc6(struct intel_gt *gt) * on top of the last known real value, as the approximated RC6 * counter value. */ - val = ktime_since_raw(pmu->sleep_last); - val += pmu->sample[__I915_SAMPLE_RC6].cur; + val = ktime_since_raw(pmu->sleep_