RE: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
[AMD Official Use Only - General] Hi Andrey, Yes, moving irq disable can fix the issue. Change in amdgpu_fence_process is just want to make sure driver can correct itself from an overflow situation. Didn’t know about the previous issue there. Do you know if the issue still exists? Or is it on VCE only? Thanks, Victor -Original Message- From: Grodzovsky, Andrey Sent: Friday, September 16, 2022 9:50 PM To: Koenig, Christian ; Zhao, Victor ; amd-gfx@lists.freedesktop.org Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow On 2022-09-16 01:18, Christian König wrote: > Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky: >> >> On 2022-09-15 15:26, Christian König wrote: >>> Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: > [AMD Official Use Only - General] > > Hi Christian, > > The test sequence is executing a compute engine hang while running > a lot of containers submitting gfx jobs. We have advanced tdr mode > and mode2 reset enabled on driver. > When a compute hang job timeout happens, the 2 jobs on the gfx > pending list maybe signaled after drm_sched_stop. So they will not > be removed from pending list but have the > DMA_FENCE_FLAG_SIGNALED_BIT set. > At the amdgpu_device_recheck_guilty_jobs step, the first job will > be rerun and removed from pending list. > At the resubmit setp, the second job (with signaled bit) will be > resubmitted. Since it still has signaled bit, drm_sched_job_done > will be called directly. This decrease the hw_rq_count which > allows more jobs emitted but did not clean fence_drv rcu ptr. > This results in an overflow in the fence_drv. Since we will use > num_fences_mask in amdgpu_fence_process, when overflow happens, > the signal of some job will be skipped which result in an infinite > wait for the fence_drv rcu ptr. > > So close irq before sched_stop could avoid signal jobs after > drm_sched_stop. And signal job one by one in fence_process instead > of using a mask will handle the overflow situation. > > Another fix could be skip submitting jobs which already signaled > during resubmit stage, which may look cleaner. > > Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + >>> >>> Upstream will pretty much kill you for that. >>> >>> Re-setting a fence from a signaled to an unsignaled state is a >>> massive no-go. >>> >>> Christian. >> >> >> Is it worse then doing fence->seqno = seq; ? This is already a huge >> hack , no ? > > No, it's as equally bad. I don't think we can do either. > > Christian. And all those ugly hack are there because we reuse a dma_fence (hw_fence embedded into the job) and correct me if I am wrong but I don't think dma_fence is ever supposed to be reused. So maybe like Victor suggested we should move close and flush irq before sched_stop - this in my opinion should solve the issue, but Victor - why then you still need the change in amdgpu_fence_process ? You will not have the overflow situation because by moving irq_disable before stop any job that signaled will be removed from the scheduler pending list anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce that bug. Andrey > >> >> Andrey >> >> >>> /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey > > > Thanks, > Victor > > > > -Original Message- > From: Koenig, Christian > Sent: Thursday, September 15, 2022 2:32 PM > To: Zhao, Victor ; > amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey > > Cc: Deng, Emily > Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by > overflow > > > > Am 15.09.22 um 06:02 schrieb Zhao, Victor: >> [AMD Official Use Only - General] >> >> Ping. >> >> Hi @Koenig, Christian and @Grodzovsky,
Re: [PATCH 1/2] drm/amd/display: Reduce number of arguments of dml314's CalculateWatermarksAndDRAMSpeedChangeSupport()
On Fri, Sep 16, 2022 at 03:04:53PM -0700, Tom Rix wrote: > > On 9/16/22 2:06 PM, Nathan Chancellor wrote: > > Most of the arguments are identical between the two call sites and they > > can be accessed through the 'struct vba_vars_st' pointer. This reduces > > the total amount of stack space that > > dml314_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with > > LLVM 16 (2216 -> 1976), helping clear up the following clang warning: > > > > > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6: > > error: stack frame size (2216) exceeds limit (2048) in > > 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] > >void dml314_ModeSupportAndSystemConfigurationFull(struct > > display_mode_lib *mode_lib) > > ^ > >1 error generated. > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1710 > > Reported-by: "kernelci.org bot" > > Signed-off-by: Nathan Chancellor > > Nathan, > > I like this change but I don't think it goes far enough. > > There are many similar functions in this file and there other > display_node_vba_*.c files that pass too many vba_vars_st elements. > > I think most/all of the static functions should be refactored to pass > vba_vars_st * or vba_vars_st ** > > fwiw, i found the calling function > DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation, > hilariously long :) > > I'll do the change if you want to pass this to me, I promise not to add to > the above function name. Right, there is definitely more that could be done here; I just picked the couple of functions that would appear to make the most impact, as I am only concerned with keeping the code warning free with clang so that -Werror does not break us. I think it makes sense to take this series to fix the warnings right now (especially since this patch has technically already been accepted, as it was applied to dcn31) then follow up with refactoring, which I am more than happy to let you do if you so desire :) Thank you for the input as always! Cheers, Nathan > > --- > > > > This is just commit ab2ac59c32db ("drm/amd/display: Reduce number of > > arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()") > > applied to dml314. > > > > .../dc/dml/dcn314/display_mode_vba_314.c | 248 -- > > 1 file changed, 52 insertions(+), 196 deletions(-) > > > > diff --git > > a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c > > b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c > > index 2829f179f982..32ceb72f7a14 100644 > > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c > > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c > > @@ -325,64 +325,28 @@ static void > > CalculateVupdateAndDynamicMetadataParameters( > > static void CalculateWatermarksAndDRAMSpeedChangeSupport( > > struct display_mode_lib *mode_lib, > > unsigned int PrefetchMode, > > - unsigned int NumberOfActivePlanes, > > - unsigned int MaxLineBufferLines, > > - unsigned int LineBufferSize, > > - unsigned int WritebackInterfaceBufferSize, > > double DCFCLK, > > double ReturnBW, > > - bool SynchronizedVBlank, > > - unsigned int dpte_group_bytes[], > > - unsigned int MetaChunkSize, > > double UrgentLatency, > > double ExtraLatency, > > - double WritebackLatency, > > - double WritebackChunkSize, > > double SOCCLK, > > - double DRAMClockChangeLatency, > > - double SRExitTime, > > - double SREnterPlusExitTime, > > - double SRExitZ8Time, > > - double SREnterPlusExitZ8Time, > > double DCFCLKDeepSleep, > > unsigned int DETBufferSizeY[], > > unsigned int DETBufferSizeC[], > > unsigned int SwathHeightY[], > > unsigned int SwathHeightC[], > > - unsigned int LBBitPerPixel[], > > double SwathWidthY[], > > double SwathWidthC[], > > - double HRatio[], > > - double HRatioChroma[], > > - unsigned int vtaps[], > > - unsigned int VTAPsChroma[], > > - double VRatio[], > > - double VRatioChroma[], > > - unsigned int HTotal[], > > - double PixelClock[], > > - unsigned int BlendingAndTiming[], > > unsigned int DPPPerPlane[], > > double BytePerPixelDETY[], > > double BytePerPixelDETC[], > > - double DSTXAfterScaler[], > > - double DSTYAfterScaler[], > > - bool WritebackEnable[], > > - enum source_format_class WritebackPixelFormat[], > > - double WritebackDestinationWidth[], > > - double WritebackDestinationHeight[], > > - double
[PATCH 2/2] drm/amd/display: Reduce number of arguments of dml314's CalculateFlipSchedule()
Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml314_ModeSupportAndSystemConfigurationFull() uses by 112 bytes with LLVM 16 (1976 -> 1864), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6: error: stack frame size (2216) exceeds limit (2048) in 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1710 Reported-by: "kernelci.org bot" Signed-off-by: Nathan Chancellor --- This is just commit 1dbec5b4b0ef ("drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule()") applied to dml314. .../dc/dml/dcn314/display_mode_vba_314.c | 172 +- 1 file changed, 47 insertions(+), 125 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c index 32ceb72f7a14..e4dfa714207a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c @@ -265,33 +265,13 @@ static void CalculateRowBandwidth( static void CalculateFlipSchedule( struct display_mode_lib *mode_lib, + unsigned int k, double HostVMInefficiencyFactor, double UrgentExtraLatency, double UrgentLatency, - unsigned int GPUVMMaxPageTableLevels, - bool HostVMEnable, - unsigned int HostVMMaxNonCachedPageTableLevels, - bool GPUVMEnable, - double HostVMMinPageSize, double PDEAndMetaPTEBytesPerFrame, double MetaRowBytes, - double DPTEBytesPerRow, - double BandwidthAvailableForImmediateFlip, - unsigned int TotImmediateFlipBytes, - enum source_format_class SourcePixelFormat, - double LineTime, - double VRatio, - double VRatioChroma, - double Tno_bw, - bool DCCEnable, - unsigned int dpte_row_height, - unsigned int meta_row_height, - unsigned int dpte_row_height_chroma, - unsigned int meta_row_height_chroma, - double *DestinationLinesToRequestVMInImmediateFlip, - double *DestinationLinesToRequestRowInImmediateFlip, - double *final_flip_bw, - bool *ImmediateFlipSupportedForPipe); + double DPTEBytesPerRow); static double CalculateWriteBackDelay( enum source_format_class WritebackPixelFormat, double WritebackHRatio, @@ -2892,33 +2872,13 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman for (k = 0; k < v->NumberOfActivePlanes; ++k) { CalculateFlipSchedule( mode_lib, + k, HostVMInefficiencyFactor, v->UrgentExtraLatency, v->UrgentLatency, - v->GPUVMMaxPageTableLevels, - v->HostVMEnable, - v->HostVMMaxNonCachedPageTableLevels, - v->GPUVMEnable, - v->HostVMMinPageSize, v->PDEAndMetaPTEBytesFrame[k], v->MetaRowByte[k], - v->PixelPTEBytesPerRow[k], - v->BandwidthAvailableForImmediateFlip, - v->TotImmediateFlipBytes, - v->SourcePixelFormat[k], - v->HTotal[k] / v->PixelClock[k], - v->VRatio[k], - v->VRatioChroma[k], - v->Tno_bw[k], - v->DCCEnable[k], - v->dpte_row_height[k], - v->meta_row_height[k], - v->dpte_row_height_chroma[k], - v->meta_row_height_chroma[k],
[PATCH 1/2] drm/amd/display: Reduce number of arguments of dml314's CalculateWatermarksAndDRAMSpeedChangeSupport()
Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml314_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with LLVM 16 (2216 -> 1976), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6: error: stack frame size (2216) exceeds limit (2048) in 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1710 Reported-by: "kernelci.org bot" Signed-off-by: Nathan Chancellor --- This is just commit ab2ac59c32db ("drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()") applied to dml314. .../dc/dml/dcn314/display_mode_vba_314.c | 248 -- 1 file changed, 52 insertions(+), 196 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c index 2829f179f982..32ceb72f7a14 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c @@ -325,64 +325,28 @@ static void CalculateVupdateAndDynamicMetadataParameters( static void CalculateWatermarksAndDRAMSpeedChangeSupport( struct display_mode_lib *mode_lib, unsigned int PrefetchMode, - unsigned int NumberOfActivePlanes, - unsigned int MaxLineBufferLines, - unsigned int LineBufferSize, - unsigned int WritebackInterfaceBufferSize, double DCFCLK, double ReturnBW, - bool SynchronizedVBlank, - unsigned int dpte_group_bytes[], - unsigned int MetaChunkSize, double UrgentLatency, double ExtraLatency, - double WritebackLatency, - double WritebackChunkSize, double SOCCLK, - double DRAMClockChangeLatency, - double SRExitTime, - double SREnterPlusExitTime, - double SRExitZ8Time, - double SREnterPlusExitZ8Time, double DCFCLKDeepSleep, unsigned int DETBufferSizeY[], unsigned int DETBufferSizeC[], unsigned int SwathHeightY[], unsigned int SwathHeightC[], - unsigned int LBBitPerPixel[], double SwathWidthY[], double SwathWidthC[], - double HRatio[], - double HRatioChroma[], - unsigned int vtaps[], - unsigned int VTAPsChroma[], - double VRatio[], - double VRatioChroma[], - unsigned int HTotal[], - double PixelClock[], - unsigned int BlendingAndTiming[], unsigned int DPPPerPlane[], double BytePerPixelDETY[], double BytePerPixelDETC[], - double DSTXAfterScaler[], - double DSTYAfterScaler[], - bool WritebackEnable[], - enum source_format_class WritebackPixelFormat[], - double WritebackDestinationWidth[], - double WritebackDestinationHeight[], - double WritebackSourceHeight[], bool UnboundedRequestEnabled, unsigned int CompressedBufferSizeInkByte, enum clock_change_support *DRAMClockChangeSupport, - double *UrgentWatermark, - double *WritebackUrgentWatermark, - double *DRAMClockChangeWatermark, - double *WritebackDRAMClockChangeWatermark, double *StutterExitWatermark, double *StutterEnterPlusExitWatermark, double *Z8StutterExitWatermark, - double *Z8StutterEnterPlusExitWatermark, - double *MinActiveDRAMClockChangeLatencySupported); + double *Z8StutterEnterPlusExitWatermark); static void CalculateDCFCLKDeepSleep( struct display_mode_lib *mode_lib, @@ -3041,64 +3005,28 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman CalculateWatermarksAndDRAMSpeedChangeSupport( mode_lib, PrefetchMode, - v->NumberOfActivePlanes, - v->MaxLineBufferLines, - v->LineBufferSize, - v->WritebackInterfaceBufferSize, v->DCFCLK, v->ReturnBW, -
Re: Final gang submit patches
On Fri, Sep 16, 2022 at 5:08 AM Christian König wrote: > > Hey guys, > > so thanks to Ruijing I was finally able to hammer out all the known VCN > regressions from this patch set. > > Alex can you review those? AFAIK you are pretty much the only other > person deep enough in the CS IOCTL for that. These look good to me. Just a couple of comments on some of the patches. We probably also want to patch to bump the driver version so userspace knows when this is available. With those addressed, the series is: Reviewed-by: Alex Deucher > > Going to takle the userptr issue Bas stumbled over next. > > Thanks, > Christian. > >
Re: [PATCH 8/8] drm/amdgpu: add gang submit frontend v5
On Fri, Sep 16, 2022 at 5:09 AM Christian König wrote: > > Allows submitting jobs as gang which needs to run on multiple engines at the > same time. > > All members of the gang get the same implicit, explicit and VM dependencies. > So > no gang member will start running until everything else is ready. > > The last job is considered the gang leader (usually a submission to the GFX > ring) and used for signaling output dependencies. > > Each job is remembered individually as user of a buffer object, so there is no > joining of work at the end. > > v2: rebase and fix review comments from Andrey and Yogesh > v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang > leader only when necessary > v4: fix order of pushing jobs and adding fences found by Trigger. > v5: fix job index calculation and adding IBs to jobs > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 269 ++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h | 10 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 12 +- > 5 files changed, 195 insertions(+), 99 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > index 4f5bd96000ec..c7b1a2dfde13 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > @@ -686,6 +686,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, > ib->length_dw = ib_len; > /* This works for NO_HWS. TODO: need to handle without knowing VMID */ > job->vmid = vmid; > + job->num_ibs = 1; > > ret = amdgpu_ib_schedule(ring, 1, ib, job, ); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 58088c663125..964052377991 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -64,11 +64,11 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser > *p, > return 0; > } > > -static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, > - struct drm_amdgpu_cs_chunk_ib *chunk_ib, > - unsigned int *num_ibs) > +static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p, > +struct drm_amdgpu_cs_chunk_ib *chunk_ib) > { > struct drm_sched_entity *entity; > + unsigned int i; > int r; > > r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, > @@ -77,17 +77,38 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, > if (r) > return r; > > - /* Abort if there is no run queue associated with this entity. > -* Possibly because of disabled HW IP*/ > + /* > +* Abort if there is no run queue associated with this entity. > +* Possibly because of disabled HW IP. > +*/ > if (entity->rq == NULL) > return -EINVAL; > > - /* Currently we don't support submitting to multiple entities */ > - if (p->entity && p->entity != entity) > + /* Check if we can add this IB to some existing job */ > + for (i = 0; i < p->gang_size; ++i) > + if (p->entities[i] == entity) > + return i; > + > + /* If not increase the gang size if possible */ > + if (i == AMDGPU_CS_GANG_SIZE) > return -EINVAL; > > - p->entity = entity; > - ++(*num_ibs); > + p->entities[i] = entity; > + p->gang_size = i + 1; > + return i; > +} > + > +static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, > + struct drm_amdgpu_cs_chunk_ib *chunk_ib, > + unsigned int *num_ibs) > +{ > + int r; > + > + r = amdgpu_cs_job_idx(p, chunk_ib); > + if (r < 0) > + return r; > + > + ++(num_ibs[r]); > return 0; > } > > @@ -161,11 +182,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, >union drm_amdgpu_cs *cs) > { > struct amdgpu_fpriv *fpriv = p->filp->driver_priv; > + unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { }; > struct amdgpu_vm *vm = >vm; > uint64_t *chunk_array_user; > uint64_t *chunk_array; > - unsigned size, num_ibs = 0; > uint32_t uf_offset = 0; > + unsigned int size; > int ret; > int i; > > @@ -228,7 +250,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, > if (size < sizeof(struct drm_amdgpu_cs_chunk_ib)) > goto free_partial_kdata; > > - ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, > _ibs); > + ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs); > if (ret) > goto
Re: [PATCH 3/8] drm/amdgpu: fix user fence CS check
Am 16.09.22 um 20:01 schrieb Alex Deucher: On Fri, Sep 16, 2022 at 5:09 AM Christian König wrote: An offset of zero is valid, check if the BO is present or not. Signed-off-by: Christian König Since this is a bug fix, should we make the first patch in the series? Good point, going to re-arrange this before pushing. Christian. Alex --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 5c37dde97ff2..265ed2118a80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, ring = to_amdgpu_ring(entity->rq->sched); /* MM engine doesn't support user fences */ - if (p->job->uf_addr && ring->funcs->no_user_fence) + if (p->uf_entry.tv.bo && ring->funcs->no_user_fence) return -EINVAL; if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX && -- 2.25.1
Re: [PATCH 3/8] drm/amdgpu: fix user fence CS check
On Fri, Sep 16, 2022 at 5:09 AM Christian König wrote: > > An offset of zero is valid, check if the BO is present or not. > > Signed-off-by: Christian König Since this is a bug fix, should we make the first patch in the series? Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 5c37dde97ff2..265ed2118a80 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, > > ring = to_amdgpu_ring(entity->rq->sched); > /* MM engine doesn't support user fences */ > - if (p->job->uf_addr && ring->funcs->no_user_fence) > + if (p->uf_entry.tv.bo && ring->funcs->no_user_fence) > return -EINVAL; > > if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX && > -- > 2.25.1 >
Re: [PATCH 7/7] drm/amdgpu: Fix amdgpu_vm_pt_free warning
Am 16.09.22 um 17:05 schrieb Philip Yang: Free page table BO from vm resv unlocked context generate below warnings. Add a pt_free_work in vm to free page table BO from vm->pt_freed list. pass vm resv unlock status from page table update caller, and add vm_bo entry to vm->pt_freed list and schedule the pt_free_work if calling with vm resv unlocked. WARNING: CPU: 12 PID: 3238 at drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0 Call Trace: amdgpu_vm_pt_free+0x42/0xd0 [amdgpu] amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu] amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu] amdgpu_vm_update_range+0x2a6/0x640 [amdgpu] svm_range_unmap_from_gpus+0x110/0x300 [amdgpu] svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu] __mmu_notifier_invalidate_range_start+0x1cd/0x230 unmap_vmas+0x9d/0x140 unmap_region+0xa8/0x110 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 5 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 +-- 3 files changed, 47 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 9d7f9b3670d7..3402fb62b4ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2057,6 +2057,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) spin_lock_init(>status_lock); INIT_LIST_HEAD(>freed); INIT_LIST_HEAD(>done); + INIT_LIST_HEAD(>pt_freed); + INIT_WORK(>pt_free_work, amdgpu_vm_pt_free_work); /* create scheduler entities for page table updates */ r = drm_sched_entity_init(>immediate, DRM_SCHED_PRIORITY_NORMAL, @@ -2258,6 +2260,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm); + flush_work(>pt_free_work); + root = amdgpu_bo_ref(vm->root.bo); amdgpu_bo_reserve(root, true); amdgpu_vm_set_pasid(adev, vm, 0); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 98895c8fef6c..2275c692d1ca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -278,6 +278,10 @@ struct amdgpu_vm { /* BOs which are invalidated, has been updated in the PTs */ struct list_headdone; + /* PT BOs scheduled to free and fill with zero if vm_resv is not hold */ + struct list_headpt_freed; + struct work_struct pt_free_work; + /* contains the page directory */ struct amdgpu_vm_bo_base root; struct dma_fence*last_update; @@ -473,6 +477,7 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params *params, int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params, uint64_t start, uint64_t end, uint64_t dst, uint64_t flags); +void amdgpu_vm_pt_free_work(struct work_struct *work); #if defined(CONFIG_DEBUG_FS) void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 61a4b7182d44..25277380ef5f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -644,6 +644,27 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) amdgpu_bo_unref(>bo); } +void amdgpu_vm_pt_free_work(struct work_struct *work) +{ + struct amdgpu_vm_bo_base *entry, *next; + struct amdgpu_vm *vm; + LIST_HEAD(pt_freed); + + vm = container_of(work, struct amdgpu_vm, pt_free_work); + + spin_lock(>status_lock); + list_splice_init(>pt_freed, _freed); + spin_unlock(>status_lock); + + // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid Please no // style comments in kernel code. Apart from that the whole series looks really good to me. Thanks, Christian. + amdgpu_bo_reserve(vm->root.bo, true); + + list_for_each_entry_safe(entry, next, _freed, vm_status) + amdgpu_vm_pt_free(entry); + + amdgpu_bo_unreserve(vm->root.bo); +} + /** * amdgpu_vm_pt_free_dfs - free PD/PT levels * @@ -655,11 +676,24 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) */ static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev, struct amdgpu_vm *vm, - struct amdgpu_vm_pt_cursor *start) + struct amdgpu_vm_pt_cursor *start, + bool unlocked) { struct amdgpu_vm_pt_cursor cursor; struct amdgpu_vm_bo_base *entry; + if (unlocked) { + spin_lock(>status_lock); + for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry) + list_move(>vm_status,
Re: [PATCH 1/7] drm/amdgpu: Rename vm invalidate lock to status_lock
Am 16.09.22 um 17:05 schrieb Philip Yang: The vm status_lock will be used to protect all vm status lists. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 30 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 +++- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 690fd4f639f1..596f1ea8babc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -225,9 +225,9 @@ static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_invalidated(struct amdgpu_vm_bo_base *vm_bo) { - spin_lock(_bo->vm->invalidated_lock); + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->invalidated); - spin_unlock(_bo->vm->invalidated_lock); + spin_unlock(_bo->vm->status_lock); } /** @@ -256,9 +256,9 @@ static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_done(struct amdgpu_vm_bo_base *vm_bo) { - spin_lock(_bo->vm->invalidated_lock); + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->done); - spin_unlock(_bo->vm->invalidated_lock); + spin_unlock(_bo->vm->status_lock); } /** @@ -936,7 +936,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, amdgpu_bo_get_memory(bo_va->base.bo, vram_mem, gtt_mem, cpu_mem); } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; @@ -949,7 +949,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, amdgpu_bo_get_memory(bo_va->base.bo, vram_mem, gtt_mem, cpu_mem); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); } /** * amdgpu_vm_bo_update - update all BO mappings in the vm page table @@ -1290,12 +1290,12 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev, return r; } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); while (!list_empty(>invalidated)) { bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va, base.vm_status); resv = bo_va->base.bo->tbo.base.resv; - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); /* Try to reserve the BO to avoid clearing its ptes */ if (!amdgpu_vm_debug && dma_resv_trylock(resv)) @@ -1310,9 +1310,9 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev, if (!clear) dma_resv_unlock(resv); - spin_lock(>invalidated_lock); + spin_lock(>status_lock); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); return 0; } @@ -1763,9 +1763,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, } } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_del(_va->base.vm_status); - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); list_for_each_entry_safe(mapping, next, _va->valids, list) { list_del(>list); @@ -2019,7 +2019,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) INIT_LIST_HEAD(>moved); INIT_LIST_HEAD(>idle); INIT_LIST_HEAD(>invalidated); - spin_lock_init(>invalidated_lock); + spin_lock_init(>status_lock); INIT_LIST_HEAD(>freed); INIT_LIST_HEAD(>done); @@ -2584,7 +2584,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) id = 0; seq_puts(m, "\tInvalidated BOs:\n"); - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; @@ -2599,7 +2599,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) continue; total_done += amdgpu_bo_print_info(id++, bo_va->base.bo, m); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); total_done_objs = id; seq_printf(m, "\tTotal idle size:%12lld\tobjs:\t%d\n", total_idle, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 9ecb7f663e19..98895c8fef6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -268,7 +268,9 @@ struct amdgpu_vm { /* regular invalidated BOs, but not yet updated in the PT */ struct
Re: [PATCH -next 1/6] drm/amd/display: clean up some inconsistent indentings
Applied the series. Thanks, Alex On Wed, Sep 14, 2022 at 9:58 PM Yang Li wrote: > > clean up some inconsistent indentings > > Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2177 > Reported-by: Abaci Robot > Signed-off-by: Yang Li > --- > .../gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c| 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c > b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c > index 4e3356d12147..8dfe639b6508 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c > @@ -1051,10 +1051,10 @@ static bool CalculatePrefetchSchedule( > bytes_pp = myPipe->BytePerPixelY + myPipe->BytePerPixelC; > /*rev 99*/ > prefetch_bw_pr = dml_min(1, bytes_pp * myPipe->PixelClock / (double) > myPipe->DPPPerPlane); > -max_Tsw = dml_max(PrefetchSourceLinesY, PrefetchSourceLinesC) * LineTime; > + max_Tsw = dml_max(PrefetchSourceLinesY, PrefetchSourceLinesC) * > LineTime; > prefetch_sw_bytes = PrefetchSourceLinesY * swath_width_luma_ub * > myPipe->BytePerPixelY + PrefetchSourceLinesC * swath_width_chroma_ub * > myPipe->BytePerPixelC; > prefetch_bw_oto = dml_max(bytes_pp * myPipe->PixelClock / > myPipe->DPPPerPlane, prefetch_sw_bytes / (dml_max(PrefetchSourceLinesY, > PrefetchSourceLinesC) * LineTime)); > -prefetch_bw_oto = dml_max(prefetch_bw_pr, prefetch_sw_bytes / max_Tsw); > + prefetch_bw_oto = dml_max(prefetch_bw_pr, prefetch_sw_bytes / > max_Tsw); > > min_Lsw = dml_max(1, dml_max(PrefetchSourceLinesY, > PrefetchSourceLinesC) / max_vratio_pre); > Lsw_oto = dml_ceil(4 * dml_max(prefetch_sw_bytes / prefetch_bw_oto / > LineTime, min_Lsw), 1) / 4; > -- > 2.20.1.7.g153144c >
Re: [PATCH][next] drm/amdkfd: Fix spelling mistake "detroyed" -> "destroyed"
Applied. Thanks! On Wed, Sep 14, 2022 at 1:15 PM Colin Ian King wrote: > > There is a spelling mistake in a pr_debug message. Fix it. > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index 7522bf2d2f57..c70c026c9a93 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -898,7 +898,7 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf) > return VM_FAULT_SIGBUS; > } > if (!mmget_not_zero(svm_bo->eviction_fence->mm)) { > - pr_debug("addr 0x%lx of process mm is detroyed\n", addr); > + pr_debug("addr 0x%lx of process mm is destroyed\n", addr); > return VM_FAULT_SIGBUS; > } > > -- > 2.37.1 >
[pull] amdgpu, amdkfd, radeon drm-next-6.1
Hi Dave, Daniel, Updates for 6.1. The following changes since commit 213cb76ddc8b875e772f9f4d173feefa122716af: Merge tag 'drm-intel-gt-next-2022-09-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2022-09-12 21:12:23 +1000) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-next-6.1-2022-09-16 for you to fetch changes up to dacd2d2d9d800b7ab2ee2734578112532cba8105: drm/amd/display: fix boolconv.cocci warning (2022-09-14 12:38:53 -0400) amd-drm-next-6.1-2022-09-16: amdgpu: - PCI AER fixes - BACO fix - RAS fixes - XGMI fixes - Display SubVP fixes - DCN 3.2 updates - DCN 3.1.4 updates - LLVM fixes - CS cleanup in preparation for gang submit - Add some new GC CG registers - Misc cleanups amdkfd: - Fix CRIU regression - CPU fault on COW mapping fixes - Prefault fixes - Misc cleanups radeon: - Misc cleanups Alex Deucher (5): drm/amdgpu: add HDP remap functionality to nbio 7.7 drm/amdgpu: fix warning about missing imu prototype drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega drm/amdgpu: make sure to init common IP before gmc Alvin Lee (4): drm/amd/display: Update MBLK calculation for SubVP drm/amd/display: SW cursor fallback for SubVP drm/amd/display: Update viewport position for phantom pipes drm/amd/display: Refactor SubVP calculation to remove FPU Aric Cyr (4): drm/amd/display: Optimizations for DML math drm/amd/display: Fix divide by zero in DML drm/amd/display: Don't adjust VRR unnecessarily drm/amd/display: 3.2.202 Aurabindo Pillai (3): drm/amd/display: Revert "Fallback to SW cursor if SubVP + cursor too big" drm/amd/display: add workaround for subvp cursor corruption for DCN32/321 drm/amd/display: Fix register definitions for DCN32/321 Candice Li (4): drm/amdgpu: Rely on MCUMC_STATUS for umc v8_10 correctable error counter only drm/amdgpu: Enable full reset when RAS is supported on gc v11_0_0 drm/amdgpu: Add EEPROM I2C address for smu v13_0_0 drm/amdgpu: Skip reset error status for psp v13_0_0 Charlene Liu (1): drm/amd/display: update header files Chris Park (1): drm/amd/display: Correct I2C register offset Christian König (6): drm/amdgpu: remove SRIOV and MCBP dependencies from the CS drm/amdgpu: move setting the job resources drm/amdgpu: revert "partial revert "remove ctx->lock" v2" drm/amdgpu: use DMA_RESV_USAGE_BOOKKEEP v2 drm/amdgpu: cleanup CS init/fini and pass1 drm/amdgpu: reorder CS code Duncan Ma (1): drm/amd/display: Correct dram channel width for dcn314 Guchun Chen (1): drm/amd/pm: disable BACO entry/exit completely on several sienna cichlid cards Hamza Mahfooz (1): drm/amdgpu: use dirty framebuffer helper JeromeHong (1): drm/amd/display: Avoid force minimal transaction in case of surface_count equal to 0 Jiapeng Chong (13): drm/amd/display: make enc32_stream_encoder_dvi_set_stream_attribute static drm/amd/display: make dscl32_calc_lb_num_partitions static drm/amd/display: make dcn32_link_encoder_is_in_alt_mode and dcn32_link_encoder_get_max_link_cap static drm/amd/display: make mpc32_program_shaper and mpc32_program_3dlut static drm/amd/display: Remove the unused function copy_stream_update_to_stream() drm/amd/display: make some functions static drm/amd/display: make enc314_stream_encoder_dvi_set_stream_attribute static drm/amd/display: make optc314_phantom_crtc_post_enable static drm/amd/display: make some functions static drm/amd/display: make mmhubbub32_config_mcif_buf static drm/amd/display: make some functions static drm/amd/display: make some functions static drm/amd/display: make optc32_phantom_crtc_post_enable, optc32_setup_manual_trigger and optc32_set_drr static Jilin Yuan (1): drm/amd/display: fix repeated words in comments JinZe.Xu (1): drm/amd/display: Assign link type before check dsc workaround. Jingyu Wang (5): drm/amdgpu: cleanup coding style in amdgpu_acpi.c drm/amdgpu: cleanup coding style in amdgpu_sync.c file drm/amdgpu: cleanup coding style in amdgpu_amdkfd.c drm/amdgpu: cleanup coding style in amdgpu_amdkfd_gpuvm.c drm/amdgpu/display: remove unneeded "default n" options John Clements (1): drm/amdgpu: added support for ras driver loading Lee, Alvin (1): drm/amd/display: Added debug option for forcing subvp num ways Leo Chen (2): drm/amd/display: Fixing DIG FIFO Error drm/amd/display: Adding log for spread_spectrum_info Li Ma (1): drm/amd/pm:add new gpu_metrics_v2_3 to acquire average temperature info Lijo Lazar
[PATCH 3/7] drm/amdgpu: Use vm status_lock to protect vm idle list
Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 4a1cb20deb2d..c3412709e626 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -211,7 +211,9 @@ static void amdgpu_vm_bo_moved(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base *vm_bo) { + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->idle); + spin_unlock(_bo->vm->status_lock); vm_bo->moved = false; } @@ -2554,6 +2556,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) unsigned int total_done_objs = 0; unsigned int id = 0; + spin_lock(>status_lock); seq_puts(m, "\tIdle BOs:\n"); list_for_each_entry_safe(bo_va, tmp, >idle, base.vm_status) { if (!bo_va->base.bo) @@ -2591,7 +2594,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) id = 0; seq_puts(m, "\tInvalidated BOs:\n"); - spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; -- 2.35.1
[PATCH 2/7] drm/amdgpu: Use vm status_lock to protect relocated list
Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 596f1ea8babc..4a1cb20deb2d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -240,10 +240,13 @@ static void amdgpu_vm_bo_invalidated(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base *vm_bo) { - if (vm_bo->bo->parent) + if (vm_bo->bo->parent) { + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->relocated); - else + spin_unlock(_bo->vm->status_lock); + } else { amdgpu_vm_bo_idle(vm_bo); + } } /** @@ -680,9 +683,14 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev, struct amdgpu_vm_update_params params; struct amdgpu_vm_bo_base *entry; bool flush_tlb_needed = false; + LIST_HEAD(relocated); int r, idx; - if (list_empty(>relocated)) + spin_lock(>status_lock); + list_splice_init(>relocated, ); + spin_unlock(>status_lock); + + if (list_empty()) return 0; if (!drm_dev_enter(adev_to_drm(adev), )) @@ -697,7 +705,7 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev, if (r) goto error; - list_for_each_entry(entry, >relocated, vm_status) { + list_for_each_entry(entry, , vm_status) { /* vm_flush_needed after updating moved PDEs */ flush_tlb_needed |= entry->moved; @@ -713,9 +721,8 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev, if (flush_tlb_needed) atomic64_inc(>tlb_seq); - while (!list_empty(>relocated)) { - entry = list_first_entry(>relocated, -struct amdgpu_vm_bo_base, + while (!list_empty()) { + entry = list_first_entry(, struct amdgpu_vm_bo_base, vm_status); amdgpu_vm_bo_idle(entry); } @@ -912,6 +919,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, { struct amdgpu_bo_va *bo_va, *tmp; + spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >idle, base.vm_status) { if (!bo_va->base.bo) continue; @@ -936,7 +944,6 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, amdgpu_bo_get_memory(bo_va->base.bo, vram_mem, gtt_mem, cpu_mem); } - spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; -- 2.35.1
[PATCH 4/7] drm/amdgpu: Use vm status_lock to protect vm moved list
Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index c3412709e626..d5d170de51ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -198,7 +198,9 @@ static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_moved(struct amdgpu_vm_bo_base *vm_bo) { + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->moved); + spin_unlock(_bo->vm->status_lock); } /** @@ -1287,19 +1289,24 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, int amdgpu_vm_handle_moved(struct amdgpu_device *adev, struct amdgpu_vm *vm) { - struct amdgpu_bo_va *bo_va, *tmp; + struct amdgpu_bo_va *bo_va; struct dma_resv *resv; bool clear; int r; - list_for_each_entry_safe(bo_va, tmp, >moved, base.vm_status) { + spin_lock(>status_lock); + while (!list_empty(>moved)) { + bo_va = list_first_entry(>moved, struct amdgpu_bo_va, +base.vm_status); + spin_unlock(>status_lock); + /* Per VM BOs never need to bo cleared in the page tables */ r = amdgpu_vm_bo_update(adev, bo_va, false); if (r) return r; + spin_lock(>status_lock); } - spin_lock(>status_lock); while (!list_empty(>invalidated)) { bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va, base.vm_status); @@ -1396,7 +1403,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device *adev, if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv && !bo_va->base.moved) { + spin_lock(>status_lock); list_move(_va->base.vm_status, >moved); + spin_unlock(>status_lock); } trace_amdgpu_vm_bo_map(bo_va, mapping); } -- 2.35.1
[PATCH 7/7] drm/amdgpu: Fix amdgpu_vm_pt_free warning
Free page table BO from vm resv unlocked context generate below warnings. Add a pt_free_work in vm to free page table BO from vm->pt_freed list. pass vm resv unlock status from page table update caller, and add vm_bo entry to vm->pt_freed list and schedule the pt_free_work if calling with vm resv unlocked. WARNING: CPU: 12 PID: 3238 at drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0 Call Trace: amdgpu_vm_pt_free+0x42/0xd0 [amdgpu] amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu] amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu] amdgpu_vm_update_range+0x2a6/0x640 [amdgpu] svm_range_unmap_from_gpus+0x110/0x300 [amdgpu] svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu] __mmu_notifier_invalidate_range_start+0x1cd/0x230 unmap_vmas+0x9d/0x140 unmap_region+0xa8/0x110 Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 5 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 +-- 3 files changed, 47 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 9d7f9b3670d7..3402fb62b4ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2057,6 +2057,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) spin_lock_init(>status_lock); INIT_LIST_HEAD(>freed); INIT_LIST_HEAD(>done); + INIT_LIST_HEAD(>pt_freed); + INIT_WORK(>pt_free_work, amdgpu_vm_pt_free_work); /* create scheduler entities for page table updates */ r = drm_sched_entity_init(>immediate, DRM_SCHED_PRIORITY_NORMAL, @@ -2258,6 +2260,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm); + flush_work(>pt_free_work); + root = amdgpu_bo_ref(vm->root.bo); amdgpu_bo_reserve(root, true); amdgpu_vm_set_pasid(adev, vm, 0); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 98895c8fef6c..2275c692d1ca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -278,6 +278,10 @@ struct amdgpu_vm { /* BOs which are invalidated, has been updated in the PTs */ struct list_headdone; + /* PT BOs scheduled to free and fill with zero if vm_resv is not hold */ + struct list_headpt_freed; + struct work_struct pt_free_work; + /* contains the page directory */ struct amdgpu_vm_bo_base root; struct dma_fence*last_update; @@ -473,6 +477,7 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params *params, int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params, uint64_t start, uint64_t end, uint64_t dst, uint64_t flags); +void amdgpu_vm_pt_free_work(struct work_struct *work); #if defined(CONFIG_DEBUG_FS) void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 61a4b7182d44..25277380ef5f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -644,6 +644,27 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) amdgpu_bo_unref(>bo); } +void amdgpu_vm_pt_free_work(struct work_struct *work) +{ + struct amdgpu_vm_bo_base *entry, *next; + struct amdgpu_vm *vm; + LIST_HEAD(pt_freed); + + vm = container_of(work, struct amdgpu_vm, pt_free_work); + + spin_lock(>status_lock); + list_splice_init(>pt_freed, _freed); + spin_unlock(>status_lock); + + // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid + amdgpu_bo_reserve(vm->root.bo, true); + + list_for_each_entry_safe(entry, next, _freed, vm_status) + amdgpu_vm_pt_free(entry); + + amdgpu_bo_unreserve(vm->root.bo); +} + /** * amdgpu_vm_pt_free_dfs - free PD/PT levels * @@ -655,11 +676,24 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) */ static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev, struct amdgpu_vm *vm, - struct amdgpu_vm_pt_cursor *start) + struct amdgpu_vm_pt_cursor *start, + bool unlocked) { struct amdgpu_vm_pt_cursor cursor; struct amdgpu_vm_bo_base *entry; + if (unlocked) { + spin_lock(>status_lock); + for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry) + list_move(>vm_status, >pt_freed); + + if (start) + list_move(>entry->vm_status, >pt_freed); + spin_unlock(>status_lock); +
[PATCH 6/7] drm/amdgpu: Use vm status_lock to protect pt free
Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 88de9f0d4728..61a4b7182d44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -637,7 +637,10 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) } ttm_bo_set_bulk_move(>bo->tbo, NULL); entry->bo->vm_bo = NULL; + + spin_lock(>vm->status_lock); list_del(>vm_status); + spin_unlock(>vm->status_lock); amdgpu_bo_unref(>bo); } -- 2.35.1
[PATCH 0/7] Fix amdgpu_vm_pt_free warning
When SVM range is unmapped from CPU, the mmu notifier callback update GPU page table to unmap the SVM range from GPU, this is the unlocked page table update context as we cannot take vm reservation lock. If unmapping from GPU free pt bo, this cause warning as vm reservation lock is not hold. To fix the warning/race issue, we schedule pt_free_work to free pt bo. This has another race to remove vm_bo entry from vm status lists. Some of the vm status lists are protected by vm invalidate lock and the rest of lists are protected by vm reservation lock. The purpose of this patch series is to use one vm status_lock to protect all vm status lists, and use it in pt_free_work to fix the race and remove the warning. Philip Yang (7): drm/amdgpu: Rename vm invalidate lock to status_lock drm/amdgpu: Use vm status_lock to protect relocated list drm/amdgpu: Use vm status_lock to protect vm idle list drm/amdgpu: Use vm status_lock to protect vm moved list drm/amdgpu: Use vm status_lock to protect vm evicted list drm/amdgpu: Use vm status_lock to protect pt free drm/amdgpu: Fix amdgpu_vm_pt_free warning drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 97 --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 9 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 44 +- 3 files changed, 117 insertions(+), 33 deletions(-) -- 2.35.1
[PATCH 5/7] drm/amdgpu: Use vm status_lock to protect vm evicted list
Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 27 +- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index d5d170de51ce..9d7f9b3670d7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -183,10 +183,12 @@ static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base *vm_bo) struct amdgpu_bo *bo = vm_bo->bo; vm_bo->moved = true; + spin_lock(_bo->vm->status_lock); if (bo->tbo.type == ttm_bo_type_kernel) list_move(_bo->vm_status, >evicted); else list_move_tail(_bo->vm_status, >evicted); + spin_unlock(_bo->vm->status_lock); } /** * amdgpu_vm_bo_moved - vm_bo is moved @@ -370,12 +372,20 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, int (*validate)(void *p, struct amdgpu_bo *bo), void *param) { - struct amdgpu_vm_bo_base *bo_base, *tmp; + struct amdgpu_vm_bo_base *bo_base; + struct amdgpu_bo *shadow; + struct amdgpu_bo *bo; int r; - list_for_each_entry_safe(bo_base, tmp, >evicted, vm_status) { - struct amdgpu_bo *bo = bo_base->bo; - struct amdgpu_bo *shadow = amdgpu_bo_shadowed(bo); + spin_lock(>status_lock); + while (!list_empty(>evicted)) { + bo_base = list_first_entry(>evicted, + struct amdgpu_vm_bo_base, + vm_status); + spin_unlock(>status_lock); + + bo = bo_base->bo; + shadow = amdgpu_bo_shadowed(bo); r = validate(param, bo); if (r) @@ -392,7 +402,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, vm->update_funcs->map_table(to_amdgpu_bo_vm(bo)); amdgpu_vm_bo_relocated(bo_base); } + spin_lock(>status_lock); } + spin_unlock(>status_lock); amdgpu_vm_eviction_lock(vm); vm->evicting = false; @@ -413,13 +425,18 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm, */ bool amdgpu_vm_ready(struct amdgpu_vm *vm) { + bool empty; bool ret; amdgpu_vm_eviction_lock(vm); ret = !vm->evicting; amdgpu_vm_eviction_unlock(vm); - return ret && list_empty(>evicted); + spin_lock(>status_lock); + empty = list_empty(>evicted); + spin_unlock(>status_lock); + + return ret && empty; } /** -- 2.35.1
[PATCH 1/7] drm/amdgpu: Rename vm invalidate lock to status_lock
The vm status_lock will be used to protect all vm status lists. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 30 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 +++- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 690fd4f639f1..596f1ea8babc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -225,9 +225,9 @@ static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_invalidated(struct amdgpu_vm_bo_base *vm_bo) { - spin_lock(_bo->vm->invalidated_lock); + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->invalidated); - spin_unlock(_bo->vm->invalidated_lock); + spin_unlock(_bo->vm->status_lock); } /** @@ -256,9 +256,9 @@ static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base *vm_bo) */ static void amdgpu_vm_bo_done(struct amdgpu_vm_bo_base *vm_bo) { - spin_lock(_bo->vm->invalidated_lock); + spin_lock(_bo->vm->status_lock); list_move(_bo->vm_status, _bo->vm->done); - spin_unlock(_bo->vm->invalidated_lock); + spin_unlock(_bo->vm->status_lock); } /** @@ -936,7 +936,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, amdgpu_bo_get_memory(bo_va->base.bo, vram_mem, gtt_mem, cpu_mem); } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; @@ -949,7 +949,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t *vram_mem, amdgpu_bo_get_memory(bo_va->base.bo, vram_mem, gtt_mem, cpu_mem); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); } /** * amdgpu_vm_bo_update - update all BO mappings in the vm page table @@ -1290,12 +1290,12 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev, return r; } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); while (!list_empty(>invalidated)) { bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va, base.vm_status); resv = bo_va->base.bo->tbo.base.resv; - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); /* Try to reserve the BO to avoid clearing its ptes */ if (!amdgpu_vm_debug && dma_resv_trylock(resv)) @@ -1310,9 +1310,9 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev, if (!clear) dma_resv_unlock(resv); - spin_lock(>invalidated_lock); + spin_lock(>status_lock); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); return 0; } @@ -1763,9 +1763,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, } } - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_del(_va->base.vm_status); - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); list_for_each_entry_safe(mapping, next, _va->valids, list) { list_del(>list); @@ -2019,7 +2019,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) INIT_LIST_HEAD(>moved); INIT_LIST_HEAD(>idle); INIT_LIST_HEAD(>invalidated); - spin_lock_init(>invalidated_lock); + spin_lock_init(>status_lock); INIT_LIST_HEAD(>freed); INIT_LIST_HEAD(>done); @@ -2584,7 +2584,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) id = 0; seq_puts(m, "\tInvalidated BOs:\n"); - spin_lock(>invalidated_lock); + spin_lock(>status_lock); list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) { if (!bo_va->base.bo) continue; @@ -2599,7 +2599,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m) continue; total_done += amdgpu_bo_print_info(id++, bo_va->base.bo, m); } - spin_unlock(>invalidated_lock); + spin_unlock(>status_lock); total_done_objs = id; seq_printf(m, "\tTotal idle size:%12lld\tobjs:\t%d\n", total_idle, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 9ecb7f663e19..98895c8fef6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -268,7 +268,9 @@ struct amdgpu_vm { /* regular invalidated BOs, but not yet updated in the PT */ struct list_head
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
Am 16.09.22 um 15:50 schrieb Andrey Grodzovsky: On 2022-09-16 01:18, Christian König wrote: Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky: On 2022-09-15 15:26, Christian König wrote: Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + Upstream will pretty much kill you for that. Re-setting a fence from a signaled to an unsignaled state is a massive no-go. Christian. Is it worse then doing fence->seqno = seq; ? This is already a huge hack , no ? No, it's as equally bad. I don't think we can do either. Christian. And all those ugly hack are there because we reuse a dma_fence (hw_fence embedded into the job) and correct me if I am wrong but I don't think dma_fence is ever supposed to be reused. Exactly that, yes. Christian. So maybe like Victor suggested we should move close and flush irq before sched_stop - this in my opinion should solve the issue, but Victor - why then you still need the change in amdgpu_fence_process ? You will not have the overflow situation because by moving irq_disable before stop any job that signaled will be removed from the scheduler pending list anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce that bug. Andrey Andrey /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait
Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
On 2022-09-16 01:18, Christian König wrote: Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky: On 2022-09-15 15:26, Christian König wrote: Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky: On 2022-09-15 06:09, Zhao, Victor wrote: [AMD Official Use Only - General] Hi Christian, The test sequence is executing a compute engine hang while running a lot of containers submitting gfx jobs. We have advanced tdr mode and mode2 reset enabled on driver. When a compute hang job timeout happens, the 2 jobs on the gfx pending list maybe signaled after drm_sched_stop. So they will not be removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set. At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and removed from pending list. At the resubmit setp, the second job (with signaled bit) will be resubmitted. Since it still has signaled bit, drm_sched_job_done will be called directly. This decrease the hw_rq_count which allows more jobs emitted but did not clean fence_drv rcu ptr. This results in an overflow in the fence_drv. Since we will use num_fences_mask in amdgpu_fence_process, when overflow happens, the signal of some job will be skipped which result in an infinite wait for the fence_drv rcu ptr. So close irq before sched_stop could avoid signal jobs after drm_sched_stop. And signal job one by one in fence_process instead of using a mask will handle the overflow situation. Another fix could be skip submitting jobs which already signaled during resubmit stage, which may look cleaner. Please help give some advice. How about the code bellow instead ? The real problem is that we reuse a dma fence twice which is not according to fma fence design, so maybe this can help ? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 8adeb7469f1e..033f0ae16784 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, struct amd if (job && job->job_run_counter) { /* reinit seq for resubmitted jobs */ fence->seqno = seq; + + /* For resubmitted job clear the singled bit */ + celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, >flags); + Upstream will pretty much kill you for that. Re-setting a fence from a signaled to an unsignaled state is a massive no-go. Christian. Is it worse then doing fence->seqno = seq; ? This is already a huge hack , no ? No, it's as equally bad. I don't think we can do either. Christian. And all those ugly hack are there because we reuse a dma_fence (hw_fence embedded into the job) and correct me if I am wrong but I don't think dma_fence is ever supposed to be reused. So maybe like Victor suggested we should move close and flush irq before sched_stop - this in my opinion should solve the issue, but Victor - why then you still need the change in amdgpu_fence_process ? You will not have the overflow situation because by moving irq_disable before stop any job that signaled will be removed from the scheduler pending list anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce that bug. Andrey Andrey /* TO be inline with external fence creation and other drivers */ dma_fence_get(fence); } else { Andrey Thanks, Victor -Original Message- From: Koenig, Christian Sent: Thursday, September 15, 2022 2:32 PM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey Cc: Deng, Emily Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow Am 15.09.22 um 06:02 schrieb Zhao, Victor: [AMD Official Use Only - General] Ping. Hi @Koenig, Christian and @Grodzovsky, Andrey, We found some reset related issues during stress test on the sequence. Please help give some comments. Thanks, Victor -Original Message- From: Victor Zhao Sent: Wednesday, September 14, 2022 6:10 PM To: amd-gfx@lists.freedesktop.org Cc: Deng, Emily ; Grodzovsky, Andrey ; Zhao, Victor Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow [background] For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The signal bit will not be cleared. At the resubmit stage after recovery, the job with hw fence signaled bit set will call job done directly instead go through fence process. This makes the hw_rq_count decrease but rcu fence pointer not cleared yet. Then overflow happens in the fence driver slots and some jobs may be skipped and leave the rcu pointer not cleared which makes an infinite wait for the slot on the next fence emitted. This infinite wait cause a job timeout on the emitting job. And driver will stuck at the its sched stop step
Re: [PATCH] drm/amdgpu: add MES and MES-KIQ version in debugfs
[Public] Acked-by: Alex Deucher From: amd-gfx on behalf of Yifan Zhang Sent: Friday, September 16, 2022 12:38 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Tim ; Du, Xiaojian ; Zhang, Yifan Subject: [PATCH] drm/amdgpu: add MES and MES-KIQ version in debugfs This patch addes MES and MES-KIQ version in debugfs. Signed-off-by: Yifan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 24 include/uapi/drm/amdgpu_drm.h | 4 2 files changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 1369c25448dc..bb0ed358909f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -328,6 +328,14 @@ static int amdgpu_firmware_info(struct drm_amdgpu_info_firmware *fw_info, fw_info->ver = adev->psp.cap_fw_version; fw_info->feature = adev->psp.cap_feature_version; break; + case AMDGPU_INFO_FW_MES_KIQ: + fw_info->ver = adev->mes.ucode_fw_version[0]; + fw_info->feature = 0; + break; + case AMDGPU_INFO_FW_MES: + fw_info->ver = adev->mes.ucode_fw_version[1]; + fw_info->feature = 0; + break; default: return -EINVAL; } @@ -1581,6 +1589,22 @@ static int amdgpu_debugfs_firmware_info_show(struct seq_file *m, void *unused) fw_info.feature, fw_info.ver); } + /* MES_KIQ */ + query_fw.fw_type = AMDGPU_INFO_FW_MES_KIQ; + ret = amdgpu_firmware_info(_info, _fw, adev); + if (ret) + return ret; + seq_printf(m, "MES_KIQ feature version: %u, firmware version: 0x%08x\n", + fw_info.feature, fw_info.ver); + + /* MES */ + query_fw.fw_type = AMDGPU_INFO_FW_MES; + ret = amdgpu_firmware_info(_info, _fw, adev); + if (ret) + return ret; + seq_printf(m, "MES feature version: %u, firmware version: 0x%08x\n", + fw_info.feature, fw_info.ver); + seq_printf(m, "VBIOS version: %s\n", ctx->vbios_version); return 0; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index c2c9c674a223..12fdf62730b8 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -755,6 +755,10 @@ struct drm_amdgpu_cs_chunk_data { #define AMDGPU_INFO_FW_TOC 0x15 /* Subquery id: Query CAP firmware version */ #define AMDGPU_INFO_FW_CAP 0x16 + /* Subquery id: Query MES_KIQ firmware version */ + #define AMDGPU_INFO_FW_MES_KIQ 0x17 + /* Subquery id: Query MES firmware version */ + #define AMDGPU_INFO_FW_MES 0x18 /* number of bytes moved for TTM migration */ #define AMDGPU_INFO_NUM_BYTES_MOVED 0x0f -- 2.37.3
Re: [PATCH V3 46/47] drm/amd/display: Fix failures of disabling primary plans
On 2022-09-15 22:44, Rodrigo Siqueira Jordao wrote: > > First of all, thanks a lot for your review. I want to take this opportunity > to discuss this topic in more depth and learn more from you and others. That really should have happened before submitting this patch. It was luck that I happened to notice it fly by. > On 2022-09-15 04:55, Michel Dänzer wrote: >> On 2022-09-14 22:08, Alex Hung wrote: >>> On 2022-09-14 10:55, Michel Dänzer wrote: On 2022-09-14 18:30, Alex Hung wrote: > On 2022-09-14 07:40, Michel Dänzer wrote: >> On 2022-09-14 15:31, Michel Dänzer wrote: >>> On 2022-09-14 07:10, Wayne Lin wrote: From: Alex Hung [Why & How] This fixes kernel errors when IGT disables primary planes during the tests kms_universal_plane::functional_test_pipe/pageflip_test_pipe. >>> >>> NAK. >>> >>> This essentially reverts commit b836a274b797 ("drm/amdgpu/dc: Require >>> primary plane to be enabled whenever the CRTC is") (except that it goes >>> even further and completely removes the requirement for any HW plane to >>> be enabled when the HW cursor is), so it would reintroduce the issues >>> described in that commit log. >> >> Actually not exactly the same issues, due to going even further than >> reverting my fix. >> >> Instead, the driver will claim that an atomic commit which enables the >> CRTC and the cursor plane, while leaving all other KMS planes disabled, >> succeeds. But the HW cursor will not actually be visible. > > I did not observe problems with cursors (GNOME 42.4 - desktop and > youtube/mpv video playback: windowed/fullscreen). Are there steps to > reproduce cursor problems? As described in my last follow-up e-mail: An atomic KMS commit which enables the CRTC and the cursor plane, but disables all other KMS planes for the CRTC. The commit will succeed, but will not result in the HW cursor being actually visible. (I don't know of any specific application or test which exercises this) >>> >>> Did you mean cursor plane depends on primary plane (i.e. no primary plane = >>> no visible HW cursor)? >> >> Sort of. I understand the HW cursor isn't an actual separate plane in AMD >> HW. Instead, the HW cursor can be displayed as part of any other HW plane. >> This means that the HW cursor can only be visible if any other plane is >> enabled. > > The commit that you mentioned (b836a274b797) was created to address some > issues reported by the user. Please, correct me if I'm wrong, but the > original issue could be reproduced by following these steps on Gnome with > Ellesmere: > > 1. Lock the screen and wait for suspending; > 2. Wake up the system a few minutes later; > 3. See two cursors, one that can be used and another one that is not working. The primary symptom (which affected myself as a user as well, that was the motivation for addressing it) was that mutter fell back to SW cursor, because the legacy KMS cursor ioctl returned an error. There were corresponding messages from mutter, visible e.g. in the journal. That the HW cursor remained visible for some users was a secondary symptom, probably due to a separate bug. I don't remember ever hitting that myself. Note that current versions of mutter use atomic KMS and probably don't hit this issue anymore by default. You'd need to set the environment variable MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0 or use an older version of mutter. > It is not evident to me why we cannot reproduce this problem. Do you have > some suggestions? If we find a case showing this bug, we can add it as part > of our tests. Something like this should work: 1. Call drmModeRmFB for the FB currently assigned to the primary plane. This implicitly disables the primary plane. 2. Call drmModeSetCursor(2) with a non-0 BO handle and position & dimensions such that the cursor would be visible on the CRTC. Before b836a274b797, this resulted in drmModeSetCursor(2) returning EINVAL due to if (state->enable && state->active && does_crtc_have_active_cursor(state) && dm_crtc_state->active_planes == 0) return -EINVAL; (The b836a274b797 commit log describes a similar error when trying to flip the legacy DPMS property from off to on. My memory is hazy on this, but I guess that would have prevented the monitor from turning back on after DPMS off if the primary/cursor planes happened to be disable/enabled respectively when trying to toggle the property) >>> If there is no primary plane, what scenario would it be required to draw a >>> cursor? >>> >>> If this is a rare case, would it still be a concern? >> >> Yes. In the KMS API, the cursor plane is like any other plane. A Wayland >> compositor or other user space may legitimately try to display something >> (not necessarily a "cursor") using only the cursor plane. The driver must >>
[PATCH 8/8] drm/amdgpu: add gang submit frontend v5
Allows submitting jobs as gang which needs to run on multiple engines at the same time. All members of the gang get the same implicit, explicit and VM dependencies. So no gang member will start running until everything else is ready. The last job is considered the gang leader (usually a submission to the GFX ring) and used for signaling output dependencies. Each job is remembered individually as user of a buffer object, so there is no joining of work at the end. v2: rebase and fix review comments from Andrey and Yogesh v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang leader only when necessary v4: fix order of pushing jobs and adding fences found by Trigger. v5: fix job index calculation and adding IBs to jobs Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 269 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 12 +- 5 files changed, 195 insertions(+), 99 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 4f5bd96000ec..c7b1a2dfde13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -686,6 +686,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, ib->length_dw = ib_len; /* This works for NO_HWS. TODO: need to handle without knowing VMID */ job->vmid = vmid; + job->num_ibs = 1; ret = amdgpu_ib_schedule(ring, 1, ib, job, ); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 58088c663125..964052377991 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -64,11 +64,11 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, return 0; } -static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, - struct drm_amdgpu_cs_chunk_ib *chunk_ib, - unsigned int *num_ibs) +static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p, +struct drm_amdgpu_cs_chunk_ib *chunk_ib) { struct drm_sched_entity *entity; + unsigned int i; int r; r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, @@ -77,17 +77,38 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, if (r) return r; - /* Abort if there is no run queue associated with this entity. -* Possibly because of disabled HW IP*/ + /* +* Abort if there is no run queue associated with this entity. +* Possibly because of disabled HW IP. +*/ if (entity->rq == NULL) return -EINVAL; - /* Currently we don't support submitting to multiple entities */ - if (p->entity && p->entity != entity) + /* Check if we can add this IB to some existing job */ + for (i = 0; i < p->gang_size; ++i) + if (p->entities[i] == entity) + return i; + + /* If not increase the gang size if possible */ + if (i == AMDGPU_CS_GANG_SIZE) return -EINVAL; - p->entity = entity; - ++(*num_ibs); + p->entities[i] = entity; + p->gang_size = i + 1; + return i; +} + +static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, + struct drm_amdgpu_cs_chunk_ib *chunk_ib, + unsigned int *num_ibs) +{ + int r; + + r = amdgpu_cs_job_idx(p, chunk_ib); + if (r < 0) + return r; + + ++(num_ibs[r]); return 0; } @@ -161,11 +182,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, union drm_amdgpu_cs *cs) { struct amdgpu_fpriv *fpriv = p->filp->driver_priv; + unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { }; struct amdgpu_vm *vm = >vm; uint64_t *chunk_array_user; uint64_t *chunk_array; - unsigned size, num_ibs = 0; uint32_t uf_offset = 0; + unsigned int size; int ret; int i; @@ -228,7 +250,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, if (size < sizeof(struct drm_amdgpu_cs_chunk_ib)) goto free_partial_kdata; - ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs); + ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs); if (ret) goto free_partial_kdata; break; @@ -265,21 +287,28 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, } } - ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm); - if (ret) - goto free_all_kdata; + if (!p->gang_size) +
[PATCH 4/8] drm/amdgpu: move entity selection and job init earlier during CS
Initialize the entity for the CS and scheduler job much earlier. v2: fix job initialisation order and use correct scheduler instance Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 54 - drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +++ 2 files changed, 30 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 265ed2118a80..58088c663125 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -68,6 +68,25 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p, struct drm_amdgpu_cs_chunk_ib *chunk_ib, unsigned int *num_ibs) { + struct drm_sched_entity *entity; + int r; + + r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, + chunk_ib->ip_instance, + chunk_ib->ring, ); + if (r) + return r; + + /* Abort if there is no run queue associated with this entity. +* Possibly because of disabled HW IP*/ + if (entity->rq == NULL) + return -EINVAL; + + /* Currently we don't support submitting to multiple entities */ + if (p->entity && p->entity != entity) + return -EINVAL; + + p->entity = entity; ++(*num_ibs); return 0; } @@ -250,6 +269,10 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, if (ret) goto free_all_kdata; + ret = drm_sched_job_init(>job->base, p->entity, >vm); + if (ret) + goto free_all_kdata; + if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) { ret = -ECANCELED; goto free_all_kdata; @@ -286,32 +309,11 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, { struct drm_amdgpu_cs_chunk_ib *chunk_ib = chunk->kdata; struct amdgpu_fpriv *fpriv = p->filp->driver_priv; + struct amdgpu_ring *ring = amdgpu_job_ring(p->job); struct amdgpu_ib *ib = >job->ibs[*num_ibs]; struct amdgpu_vm *vm = >vm; - struct drm_sched_entity *entity; - struct amdgpu_ring *ring; int r; - r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, - chunk_ib->ip_instance, - chunk_ib->ring, ); - if (r) - return r; - - /* -* Abort if there is no run queue associated with this entity. -* Possibly because of disabled HW IP. -*/ - if (entity->rq == NULL) - return -EINVAL; - - /* Currently we don't support submitting to multiple entities */ - if (p->entity && p->entity != entity) - return -EINVAL; - - p->entity = entity; - - ring = to_amdgpu_ring(entity->rq->sched); /* MM engine doesn't support user fences */ if (p->uf_entry.tv.bo && ring->funcs->no_user_fence) return -EINVAL; @@ -978,8 +980,8 @@ static void trace_amdgpu_cs_ibs(struct amdgpu_cs_parser *parser) static int amdgpu_cs_patch_ibs(struct amdgpu_cs_parser *p) { - struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched); struct amdgpu_job *job = p->job; + struct amdgpu_ring *ring = amdgpu_job_ring(job); unsigned int i; int r; @@ -1171,10 +1173,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, job = p->job; p->job = NULL; - r = drm_sched_job_init(>base, p->entity, >vm); - if (r) - goto error_unlock; - drm_sched_job_arm(>base); /* No memory allocation is allowed while holding the notifier lock. @@ -1231,8 +1229,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, error_abort: drm_sched_job_cleanup(>base); mutex_unlock(>adev->notifier_lock); - -error_unlock: amdgpu_job_free(job); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h index 2a1961bf1194..866d35bbe073 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h @@ -72,6 +72,11 @@ struct amdgpu_job { struct amdgpu_ibibs[]; }; +static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job *job) +{ + return to_amdgpu_ring(job->base.entity->rq->sched); +} + int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, struct amdgpu_job **job, struct amdgpu_vm *vm); int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, -- 2.25.1
[PATCH 7/8] drm/amdgpu: add gang submit backend v2
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 28 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h| 3 ++ 4 files changed, 67 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 79bb6fd83094..ae9371b172e3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -885,6 +885,7 @@ struct amdgpu_device { u64 fence_context; unsignednum_rings; struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; + struct dma_fence __rcu *gang_submit; boolib_pool_ready; struct amdgpu_sa_managerib_pools[AMDGPU_IB_POOL_MAX]; struct amdgpu_sched gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX]; @@ -1294,6 +1295,8 @@ u32 amdgpu_device_pcie_port_rreg(struct amdgpu_device *adev, u32 reg); void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev, u32 reg, u32 v); +struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev, + struct dma_fence *gang); /* atpx handler */ #if defined(CONFIG_VGA_SWITCHEROO) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 943c9e750575..b6f6445e50d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3503,6 +3503,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, adev->gmc.gart_size = 512 * 1024 * 1024; adev->accel_working = false; adev->num_rings = 0; + RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub()); adev->mman.buffer_funcs = NULL; adev->mman.buffer_funcs_ring = NULL; adev->vm_manager.vm_pte_funcs = NULL; @@ -3984,6 +3985,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev) release_firmware(adev->firmware.gpu_info_fw); adev->firmware.gpu_info_fw = NULL; adev->accel_working = false; + dma_fence_put(rcu_dereference_protected(adev->gang_submit, true)); amdgpu_reset_fini(adev); @@ -5919,3 +5921,36 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev, (void)RREG32(data); spin_unlock_irqrestore(>pcie_idx_lock, flags); } + +/** + * amdgpu_device_switch_gang - switch to a new gang + * @adev: amdgpu_device pointer + * @gang: the gang to switch to + * + * Try to switch to a new gang. + * Returns: NULL if we switched to the new gang or a reference to the current + * gang leader. + */ +struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev, + struct dma_fence *gang) +{ + struct dma_fence *old = NULL; + + do { + dma_fence_put(old); + rcu_read_lock(); + old = dma_fence_get_rcu_safe(>gang_submit); + rcu_read_unlock(); + + if (old == gang) + break; + + if (!dma_fence_is_signaled(old)) + return old; + + } while (cmpxchg((struct dma_fence __force **)>gang_submit, +old, gang) != old); + + dma_fence_put(old); + return NULL; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 877b3c22a8a8..cfbe19cfe9af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -173,11 +173,29 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) dma_fence_put(>hw_fence); } +void amdgpu_job_set_gang_leader(struct amdgpu_job *job, + struct amdgpu_job *leader) +{ + struct dma_fence *fence = >base.s_fence->scheduled; + + WARN_ON(job->gang_submit); + + /* +* Don't add a reference when we are the gang leader to avoid circle +* dependency. +*/ + if (job != leader) + dma_fence_get(fence); + job->gang_submit = fence; +} + void amdgpu_job_free(struct amdgpu_job *job) { amdgpu_job_free_resources(job); amdgpu_sync_free(>sync); amdgpu_sync_free(>sched_sync); + if (job->gang_submit != >base.s_fence->scheduled) +
[PATCH 3/8] drm/amdgpu: fix user fence CS check
An offset of zero is valid, check if the BO is present or not. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 5c37dde97ff2..265ed2118a80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, ring = to_amdgpu_ring(entity->rq->sched); /* MM engine doesn't support user fences */ - if (p->job->uf_addr && ring->funcs->no_user_fence) + if (p->uf_entry.tv.bo && ring->funcs->no_user_fence) return -EINVAL; if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX && -- 2.25.1
[PATCH 1/8] drm/amdgpu: cleanup CS pass2 v5
Cleanup the coding style and function names to represent the data they process for pass2 as well. Go over the chunks only twice now instead of multiple times. v2: fix job initialisation order and use correct scheduler instance v3: try to move all functional changes into a separate patch. v4: separate reordering, pass1 and pass2 change v5: fix va_start calculation Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 352 - 1 file changed, 168 insertions(+), 184 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index e104d7ef3c3d..c04073f4ead5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -278,93 +278,84 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, return ret; } -static int amdgpu_cs_ib_fill(struct amdgpu_device *adev, -struct amdgpu_cs_parser *parser) +static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p, + struct amdgpu_cs_chunk *chunk, + unsigned int *num_ibs, + unsigned int *ce_preempt, + unsigned int *de_preempt) { - struct amdgpu_fpriv *fpriv = parser->filp->driver_priv; + struct drm_amdgpu_cs_chunk_ib *chunk_ib = chunk->kdata; + struct amdgpu_fpriv *fpriv = p->filp->driver_priv; + struct amdgpu_ib *ib = >job->ibs[*num_ibs]; struct amdgpu_vm *vm = >vm; - int r, ce_preempt = 0, de_preempt = 0; + struct drm_sched_entity *entity; struct amdgpu_ring *ring; - int i, j; - - for (i = 0, j = 0; i < parser->nchunks && j < parser->job->num_ibs; i++) { - struct amdgpu_cs_chunk *chunk; - struct amdgpu_ib *ib; - struct drm_amdgpu_cs_chunk_ib *chunk_ib; - struct drm_sched_entity *entity; - - chunk = >chunks[i]; - ib = >job->ibs[j]; - chunk_ib = (struct drm_amdgpu_cs_chunk_ib *)chunk->kdata; + int r; - if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB) - continue; + r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type, + chunk_ib->ip_instance, + chunk_ib->ring, ); + if (r) + return r; - if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX && - chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) { - if (chunk_ib->flags & AMDGPU_IB_FLAG_CE) - ce_preempt++; - else - de_preempt++; + /* +* Abort if there is no run queue associated with this entity. +* Possibly because of disabled HW IP. +*/ + if (entity->rq == NULL) + return -EINVAL; - /* each GFX command submit allows 0 or 1 IB preemptible for CE & DE */ - if (ce_preempt > 1 || de_preempt > 1) - return -EINVAL; - } + /* Currently we don't support submitting to multiple entities */ + if (p->entity && p->entity != entity) + return -EINVAL; - r = amdgpu_ctx_get_entity(parser->ctx, chunk_ib->ip_type, - chunk_ib->ip_instance, chunk_ib->ring, - ); - if (r) - return r; + p->entity = entity; - if (chunk_ib->flags & AMDGPU_IB_FLAG_PREAMBLE) - parser->job->preamble_status |= - AMDGPU_PREAMBLE_IB_PRESENT; + ring = to_amdgpu_ring(entity->rq->sched); + /* MM engine doesn't support user fences */ + if (p->job->uf_addr && ring->funcs->no_user_fence) + return -EINVAL; - if (parser->entity && parser->entity != entity) - return -EINVAL; + if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX && + chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) { + if (chunk_ib->flags & AMDGPU_IB_FLAG_CE) + (*ce_preempt)++; + else + (*de_preempt)++; - /* Return if there is no run queue associated with this entity. -* Possibly because of disabled HW IP*/ - if (entity->rq == NULL) + /* Each GFX command submit allows only 1 IB max +* preemptible for CE & DE */ + if (*ce_preempt > 1 || *de_preempt > 1) return -EINVAL; + } - parser->entity = entity; - - ring = to_amdgpu_ring(entity->rq->sched); - r = amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ? - chunk_ib->ib_bytes : 0, -
[PATCH 6/8] drm/amdgpu: cleanup instance limit on VCN4 v3
Similar to what we did for VCN3 use the job instead of the parser entity. Cleanup the coding style quite a bit as well. v2: merge improved application check into this patch v3: finally fix the check Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 46 +++ 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c index 09c89faa8c27..bdabe5da8d75 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -1591,21 +1591,23 @@ static void vcn_v4_0_unified_ring_set_wptr(struct amdgpu_ring *ring) } } -static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p) +static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p, + struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; /* The create msg must be in the first IB submitted */ - if (atomic_read(>entity->fence_seq)) + if (atomic_read(>base.entity->fence_seq)) return -EINVAL; - scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC] - [AMDGPU_RING_PRIO_0].sched; - drm_sched_entity_modify_sched(p->entity, scheds, 1); + scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC] + [AMDGPU_RING_PRIO_DEFAULT].sched; + drm_sched_entity_modify_sched(job->base.entity, scheds, 1); return 0; } -static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr) +static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job, + uint64_t addr) { struct ttm_operation_ctx ctx = { false, false }; struct amdgpu_bo_va_mapping *map; @@ -1676,7 +1678,7 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr) if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11) continue; - r = vcn_v4_0_limit_sched(p); + r = vcn_v4_0_limit_sched(p, job); if (r) goto out; } @@ -1689,32 +1691,34 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr) #define RADEON_VCN_ENGINE_TYPE_DECODE (0x0003) static int vcn_v4_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p, - struct amdgpu_job *job, - struct amdgpu_ib *ib) + struct amdgpu_job *job, + struct amdgpu_ib *ib) { - struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched); - struct amdgpu_vcn_decode_buffer *decode_buffer = NULL; + struct amdgpu_ring *ring = amdgpu_job_ring(job); + struct amdgpu_vcn_decode_buffer *decode_buffer; + uint64_t addr; uint32_t val; - int r = 0; /* The first instance can decode anything */ if (!ring->me) - return r; + return 0; /* unified queue ib header has 8 double words. */ if (ib->length_dw < 8) - return r; + return 0; val = amdgpu_ib_get_value(ib, 6); //RADEON_VCN_ENGINE_TYPE + if (val != RADEON_VCN_ENGINE_TYPE_DECODE) + return 0; - if (val == RADEON_VCN_ENGINE_TYPE_DECODE) { - decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10]; + decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10]; - if (decode_buffer->valid_buf_flag & 0x1) - r = vcn_v4_0_dec_msg(p, ((u64)decode_buffer->msg_buffer_address_hi) << 32 | - decode_buffer->msg_buffer_address_lo); - } - return r; + if (!(decode_buffer->valid_buf_flag & 0x1)) + return 0; + + addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 | + decode_buffer->msg_buffer_address_lo; + return vcn_v4_0_dec_msg(p, job, addr); } static const struct amdgpu_ring_funcs vcn_v4_0_unified_ring_vm_funcs = { -- 2.25.1
[PATCH 5/8] drm/amdgpu: revert "fix limiting AV1 to the first instance on VCN3" v3
This reverts commit 250195ff744f260c169f5427422b6f39c58cb883. The job should now be initialized when we reach the parser functions. v2: merge improved application check into this patch v3: back to the original test, but use the right ring Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 39405f0db824..9c8b5fd99037 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1761,21 +1761,23 @@ static const struct amdgpu_ring_funcs vcn_v3_0_dec_sw_ring_vm_funcs = { .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, }; -static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p) +static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p, + struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; /* The create msg must be in the first IB submitted */ - if (atomic_read(>entity->fence_seq)) + if (atomic_read(>base.entity->fence_seq)) return -EINVAL; scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC] [AMDGPU_RING_PRIO_DEFAULT].sched; - drm_sched_entity_modify_sched(p->entity, scheds, 1); + drm_sched_entity_modify_sched(job->base.entity, scheds, 1); return 0; } -static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr) +static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job, + uint64_t addr) { struct ttm_operation_ctx ctx = { false, false }; struct amdgpu_bo_va_mapping *map; @@ -1846,7 +1848,7 @@ static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr) if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11) continue; - r = vcn_v3_0_limit_sched(p); + r = vcn_v3_0_limit_sched(p, job); if (r) goto out; } @@ -1860,7 +1862,7 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p, struct amdgpu_job *job, struct amdgpu_ib *ib) { - struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched); + struct amdgpu_ring *ring = amdgpu_job_ring(job); uint32_t msg_lo = 0, msg_hi = 0; unsigned i; int r; @@ -1879,7 +1881,8 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p, msg_hi = val; } else if (reg == PACKET0(p->adev->vcn.internal.cmd, 0) && val == 0) { - r = vcn_v3_0_dec_msg(p, ((u64)msg_hi) << 32 | msg_lo); + r = vcn_v3_0_dec_msg(p, job, +((u64)msg_hi) << 32 | msg_lo); if (r) return r; } -- 2.25.1
[PATCH 2/8] drm/amdgpu: cleanup error handling in amdgpu_cs_parser_bos
Return early on success and so remove all those "if (r)" in the error path. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 37 +- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index c04073f4ead5..5c37dde97ff2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -933,35 +933,34 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, if (r) goto error_validate; - amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved, -p->bytes_moved_vis); - - amdgpu_job_set_resources(p->job, p->bo_list->gds_obj, -p->bo_list->gws_obj, p->bo_list->oa_obj); - - if (!r && p->uf_entry.tv.bo) { + if (p->uf_entry.tv.bo) { struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo); r = amdgpu_ttm_alloc_gart(>tbo); + if (r) + goto error_validate; + p->job->uf_addr += amdgpu_bo_gpu_offset(uf); } + amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved, +p->bytes_moved_vis); + amdgpu_job_set_resources(p->job, p->bo_list->gds_obj, +p->bo_list->gws_obj, p->bo_list->oa_obj); + return 0; + error_validate: - if (r) - ttm_eu_backoff_reservation(>ticket, >validated); + ttm_eu_backoff_reservation(>ticket, >validated); out_free_user_pages: - if (r) { - amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) { - struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo); + amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) { + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo); - if (!e->user_pages) - continue; - amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); - kvfree(e->user_pages); - e->user_pages = NULL; - } - mutex_unlock(>bo_list->bo_list_mutex); + if (!e->user_pages) + continue; + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm); + kvfree(e->user_pages); + e->user_pages = NULL; } return r; } -- 2.25.1
Final gang submit patches
Hey guys, so thanks to Ruijing I was finally able to hammer out all the known VCN regressions from this patch set. Alex can you review those? AFAIK you are pretty much the only other person deep enough in the CS IOCTL for that. Going to takle the userptr issue Bas stumbled over next. Thanks, Christian.
RE: [PATCH 0/4] support query rlcv/rlcp firmware version
[AMD Official Use Only - General] Series is Reviewed-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Hawking Zhang Sent: Friday, September 16, 2022 1:00 AM To: amd-gfx@lists.freedesktop.org; Gao, Likun ; Deucher, Alexander Cc: Zhang, Hawking Subject: [PATCH 0/4] support query rlcv/rlcp firmware version To allow query rlcv/rlcp firmware verion info Hawking Zhang (4): drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx drm/amdgpu: support print rlc v2_x ucode hdr drm/amdgpu: add two new subquery ids drm/amdgpu: add rlcv/rlcp version info to debugfs drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 24 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 168 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 4 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 5 + include/uapi/drm/amdgpu_drm.h | 4 + 6 files changed, 159 insertions(+), 50 deletions(-) -- 2.17.1