RE: [PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.
[AMD Public Use] Some minor typos > -Original Message- > From: amd-gfx On Behalf Of Ramesh > Errabolu > Sent: Friday, September 25, 2020 6:03 PM > To: amd-gfx@lists.freedesktop.org > Cc: Errabolu, Ramesh > Subject: [PATCH 2/3] drm/amd/amdgpu: Define and implement a function that > collects > number of waves that are in flight. > > [Why] > Allow user to know how many compute units (CU) are in use at any given > moment. > > [How] > Read registers of SQ that give number of waves that are in flight > of various queues. Use this information to determine number of CU's > in use. > > Signed-off-by: Ramesh Errabolu > --- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 176 +- > .../gpu/drm/amd/include/kgd_kfd_interface.h | 12 ++ > 2 files changed, 187 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > index e6aede725197..87d4c8855805 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > @@ -38,7 +38,7 @@ > #include "soc15d.h" > #include "mmhub_v1_0.h" > #include "gfxhub_v1_0.h" > - > +#include "gfx_v9_0.h" > > enum hqd_dequeue_request_type { > NO_ACTION = 0, > @@ -706,6 +706,179 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct > kgd_dev *kgd, > gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base); > } > > +static void lock_spi_csq_mutexes(struct amdgpu_device *adev) > +{ > + mutex_lock(&adev->srbm_mutex); > + mutex_lock(&adev->grbm_idx_mutex); > + > +} > + > +static void unlock_spi_csq_mutexes(struct amdgpu_device *adev) > +{ > + mutex_unlock(&adev->grbm_idx_mutex); > + mutex_unlock(&adev->srbm_mutex); > +} > + > +/** > + * @get_wave_count: Read device registers to get number of waves in flight > for > + * a particulare queue. The method also returns the VMID associated with the particular > + * queue. > + * > + * @adev: Handle of device whose registers are to be read > + * @queue_idx: Index of queue in the queue-map bit-field > + * @wave_cnt: Output parameter updated with number of waves in flight > + * @vmid: Output parameter updated with VMID of queue whose wave count > + * is being collected > + */ > +static void get_wave_count(struct amdgpu_device *adev, int queue_idx, > + int *wave_cnt, int *vmid) > +{ > + int pipe_idx; > + int queue_slot; > + unsigned int reg_val; > + > + /* > + * Program GRBM with appropriate MEID, PIPEID, QUEUEID and VMID > + * parameters to read out waves in flight. Get VMID if there are > + * non-zero waves in flight. > + */ > + *vmid = 0xFF; > + *wave_cnt = 0; > + pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe; > + queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe; > + soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0); > + reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) + > + queue_slot); > + *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK; > + if (*wave_cnt != 0) > + *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) & > + CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT; > +} > + > +/** > + * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with > each > + * shader engine and aggregates the number of waves that are in fight for the in flight > + * process whose pasid is provided as a parameter. The process could have > ZERO > + * or more queues running and submitting waves to compute units. > + * > + * @kgd: Handle of device from which to get number of waves in flight > + * @pasid: Identifies the process for which this query call is invoked > + * @wave_cnt: Output parameter updated with number of waves in flight that > + * belong to process with given pasid > + * @max_waves_per_cu: Output parameter updated with maximum number of waves > + * possible per Compute Unit > + * > + * @note: It's possible that the device has too many queues > (oversubscription) > + * in which case a VMID could be remapped to a different PASID. This could > lead > + * to in accurate wave count. Following is a high-level sequence: to an inaccurate > + *Time T1: vmid = getVmid(); vmid is associated with Pasid P1 > + *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2 > + * In the sequence above wave count obtained from time T1 will be incorrectly > + * lost or added to total wav
[PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.
[Why] Allow user to know how many compute units (CU) are in use at any given moment. [How] Read registers of SQ that give number of waves that are in flight of various queues. Use this information to determine number of CU's in use. Signed-off-by: Ramesh Errabolu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 176 +- .../gpu/drm/amd/include/kgd_kfd_interface.h | 12 ++ 2 files changed, 187 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index e6aede725197..87d4c8855805 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -38,7 +38,7 @@ #include "soc15d.h" #include "mmhub_v1_0.h" #include "gfxhub_v1_0.h" - +#include "gfx_v9_0.h" enum hqd_dequeue_request_type { NO_ACTION = 0, @@ -706,6 +706,179 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct kgd_dev *kgd, gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base); } +static void lock_spi_csq_mutexes(struct amdgpu_device *adev) +{ + mutex_lock(&adev->srbm_mutex); + mutex_lock(&adev->grbm_idx_mutex); + +} + +static void unlock_spi_csq_mutexes(struct amdgpu_device *adev) +{ + mutex_unlock(&adev->grbm_idx_mutex); + mutex_unlock(&adev->srbm_mutex); +} + +/** + * @get_wave_count: Read device registers to get number of waves in flight for + * a particulare queue. The method also returns the VMID associated with the + * queue. + * + * @adev: Handle of device whose registers are to be read + * @queue_idx: Index of queue in the queue-map bit-field + * @wave_cnt: Output parameter updated with number of waves in flight + * @vmid: Output parameter updated with VMID of queue whose wave count + * is being collected + */ +static void get_wave_count(struct amdgpu_device *adev, int queue_idx, + int *wave_cnt, int *vmid) +{ + int pipe_idx; + int queue_slot; + unsigned int reg_val; + + /* +* Program GRBM with appropriate MEID, PIPEID, QUEUEID and VMID +* parameters to read out waves in flight. Get VMID if there are +* non-zero waves in flight. +*/ + *vmid = 0xFF; + *wave_cnt = 0; + pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe; + queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe; + soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0); + reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) + +queue_slot); + *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK; + if (*wave_cnt != 0) + *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) & +CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT; +} + +/** + * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with each + * shader engine and aggregates the number of waves that are in fight for the + * process whose pasid is provided as a parameter. The process could have ZERO + * or more queues running and submitting waves to compute units. + * + * @kgd: Handle of device from which to get number of waves in flight + * @pasid: Identifies the process for which this query call is invoked + * @wave_cnt: Output parameter updated with number of waves in flight that + * belong to process with given pasid + * @max_waves_per_cu: Output parameter updated with maximum number of waves + * possible per Compute Unit + * + * @note: It's possible that the device has too many queues (oversubscription) + * in which case a VMID could be remapped to a different PASID. This could lead + * to in accurate wave count. Following is a high-level sequence: + *Time T1: vmid = getVmid(); vmid is associated with Pasid P1 + *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2 + * In the sequence above wave count obtained from time T1 will be incorrectly + * lost or added to total wave count. + * + * The registers that provide the waves in flight are: + * + * SPI_CSQ_WF_ACTIVE_STATUS - bit-map of queues per pipe. The bit is ON if a + * queue is slotted, OFF if there is no queue. A process could have ZERO or + * more queues slotted and submitting waves to be run on compute units. Even + * when there is a queue it is possible there could be zero wave fronts, this + * can happen when queue is waiting on top-of-pipe events - e.g. waitRegMem + * command + * + * For each bit that is ON from above: + * + *Read (SPI_CSQ_WF_ACTIVE_COUNT_0 + queue_idx) register. It provides the + *number of waves that are in flight for the queue at specified index. The + *index ranges from 0 to 7. + * + *If non-zero waves are in fligth, read CP_HQD_VMID register to obtain VMID + *of the wave(s). + * + *Determine if VMID from above step maps to pasid provided as parameter. If + *it matches agrregate the wave count. That the VMID will not match pasid is + *a nor
[PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.
[Why] Allow user to know how many compute units (CU) are in use at any given moment. [How] Read registers of SQ that give number of waves that are in flight of various queues. Use this information to determine number of CU's in use. Signed-off-by: Ramesh Errabolu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 206 ++ .../gpu/drm/amd/include/kgd_kfd_interface.h | 11 + 2 files changed, 217 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index e6aede725197..2f8c8140734e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -38,7 +38,9 @@ #include "soc15d.h" #include "mmhub_v1_0.h" #include "gfxhub_v1_0.h" +#include "gfx_v9_0.h" +struct kfd_dev; enum hqd_dequeue_request_type { NO_ACTION = 0, @@ -706,6 +708,209 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct kgd_dev *kgd, gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base); } +static void lock_spi_csq_mutexes(struct amdgpu_device *adev) +{ + mutex_lock(&adev->srbm_mutex); + mutex_lock(&adev->grbm_idx_mutex); + +} + +static void unlock_spi_csq_mutexes(struct amdgpu_device *adev) +{ + mutex_unlock(&adev->grbm_idx_mutex); + mutex_unlock(&adev->srbm_mutex); +} + +/** + * @get_wave_count: Read device registers to get number of waves in flight for + * a particulare queue. The method also returns the VMID associated with the + * queue. + * + * @adev: Handle of device whose registers are to be read + * + * @queue_idx: Index of queue in the queue-map bit-field + * + * @wave_cnt: Output parameter updated with number of waves in flight + * + * @vmid: Output parameter updated with VMID of queue whose wave count + * is being collected + */ +static void get_wave_count(struct amdgpu_device *adev, int queue_idx, + int *wave_cnt, int *vmid) +{ + int pipe_idx; + int queue_slot; + unsigned int reg_val; + + /* +* By policy queues at slots 0 and 1 are reserved for non-compute +* queues i.e. those managed for graphic functions. +*/ + if ((queue_idx % adev->gfx.mec.num_queue_per_pipe) < 2) + return; + + /* +* Queue belongs to a compute workload. Determine the PIPE index +* associated wit queue and program GRBM accordingly: +* MEID = 1, PIPEID = pipe_idx, QUEUEID = queue_idx, VMID = 0 +*/ + pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe; + queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe; + soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0); + + /* +* Read from register number of waves in flight. If non-zero get the +* VMID associated with queue +*/ + reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) + +queue_slot); + *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK; + if (*wave_cnt != 0) + *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) & +CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT; +} + +/** + * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with each + * shader engine and aggregates the number of waves that are in fight for the + * process whose pasid is provided as a parameter. The process could have ZERO + * or more queues running and submitting waves to compute units. + * + * @note: It's possible that the device has too many queues (oversubscription) + * in which case a VMID could be remapped to a different PASID. This could lead + * to in accurate wave count. Following is a high-level sequence: + *Time T1: vmid = getVmid(); vmid is associated with Pasid P1 + *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2 + * In the sequence above wave count obtained from time T1 will be incorrectly + * lost or added to total wave count. + * + * @kgd: Handle of device from which to get number of waves in flight + * + * @pasid: Identifies the process for which this query call is invoked + * + * @wave_cnt: Output parameter updated with number of waves in flight that + * belong to process with given pasid + * + * The registers that provide the waves in flight are: + * + * SPI_CSQ_WF_ACTIVE_STATUS - bit-map of queues per pipe. At any moment there + * can be a max of 32 queues that could submit wave fronts to be run by compute + * units. The bit is ON if a queue is slotted, OFF if there is no queue. The + * process could have ZERO or more queues slotted and submitting waves to be + * run compute units. Even when there is a queue it is possible there could + * be zero wave fronts, this can happen when queue is waiting on top-of-pipe + * events - e.g. waitRegMem command + * + * For each bit that is ON from above: + * + *Read (SPI_CSQ_WF_ACTIVE_COUNT_0 + queue_idx) register. It provides the +