RE: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-16 Thread Zhao, Victor
[AMD Official Use Only - General]

Hi Andrey,

Yes, moving irq disable can fix the issue. Change in amdgpu_fence_process is 
just want to make sure driver can correct itself from an overflow situation. 
Didn’t know about the previous issue there.

Do you know if the issue still exists? Or is it on VCE only?


Thanks,
Victor



-Original Message-
From: Grodzovsky, Andrey  
Sent: Friday, September 16, 2022 9:50 PM
To: Koenig, Christian ; Zhao, Victor 
; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow


On 2022-09-16 01:18, Christian König wrote:
> Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:
>>
>> On 2022-09-15 15:26, Christian König wrote:
>>> Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:

 On 2022-09-15 06:09, Zhao, Victor wrote:
> [AMD Official Use Only - General]
>
> Hi Christian,
>
> The test sequence is executing a compute engine hang while running 
> a lot of containers submitting gfx jobs. We have advanced tdr mode 
> and mode2 reset enabled on driver.
> When a compute hang job timeout happens, the 2 jobs on the gfx 
> pending list maybe signaled after drm_sched_stop. So they will not 
> be removed from pending list but have the 
> DMA_FENCE_FLAG_SIGNALED_BIT set.
> At the amdgpu_device_recheck_guilty_jobs step, the first job will 
> be rerun and removed from pending list.
> At the resubmit setp, the second job (with signaled bit) will be 
> resubmitted. Since it still has signaled bit, drm_sched_job_done 
> will be called directly. This decrease the hw_rq_count which 
> allows more jobs emitted but did not clean fence_drv rcu ptr.
> This results in an overflow in the fence_drv. Since we will use 
> num_fences_mask in amdgpu_fence_process, when overflow happens, 
> the signal of some job will be skipped which result in an infinite 
> wait for the fence_drv rcu ptr.
>
> So close irq before sched_stop could avoid signal jobs after 
> drm_sched_stop. And signal job one by one in fence_process instead 
> of using a mask will handle the overflow situation.
>
> Another fix could be skip submitting jobs which already signaled 
> during resubmit stage, which may look cleaner.
>
> Please help give some advice.


 How about the code bellow  instead ? The real problem is that we 
 reuse a dma fence twice which is not according to fma fence design, 
 so maybe this can help ?


 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
 index 8adeb7469f1e..033f0ae16784 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
 @@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring 
 *ring, struct dma_fence **f, struct amd
     if (job && job->job_run_counter) {
     /* reinit seq for resubmitted jobs */
     fence->seqno = seq;
 +
 +   /* For resubmitted job clear the singled bit */
 +   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
 >flags);
 +
>>>
>>> Upstream will pretty much kill you for that.
>>>
>>> Re-setting a fence from a signaled to an unsignaled state is a 
>>> massive no-go.
>>>
>>> Christian.
>>
>>
>> Is it worse then doing fence->seqno = seq; ? This is already a huge 
>> hack , no ?
>
> No, it's as equally bad. I don't think we can do either.
>
> Christian.


And all those ugly hack are there because we reuse a dma_fence (hw_fence 
embedded into the job) and correct me if I am wrong but I don't think dma_fence 
is ever supposed to be reused.

So maybe like Victor suggested we should move close and flush irq before 
sched_stop - this in my opinion should solve the issue, but Victor - why then 
you still need the change in amdgpu_fence_process ? You will not have the 
overflow situation because by moving irq_disable before stop any job that 
signaled will be removed from the scheduler pending list anyway. Also not that 
this change reverts 'drm/amdgpu: sanitize fence numbers' and could reintroduce 
that bug.

Andrey


>
>>
>> Andrey
>>
>>
>>>

     /* TO be inline with external fence creation and 
 other drivers */
     dma_fence_get(fence);
     } else {


 Andrey


>
>
> Thanks,
> Victor
>
>
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, September 15, 2022 2:32 PM
> To: Zhao, Victor ; 
> amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 
> 
> Cc: Deng, Emily 
> Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by 
> overflow
>
>
>
> Am 15.09.22 um 06:02 schrieb Zhao, Victor:
>> [AMD Official Use Only - General]
>>
>> Ping.
>>
>> Hi @Koenig, Christian and @Grodzovsky, 

Re: [PATCH 1/2] drm/amd/display: Reduce number of arguments of dml314's CalculateWatermarksAndDRAMSpeedChangeSupport()

2022-09-16 Thread Nathan Chancellor
On Fri, Sep 16, 2022 at 03:04:53PM -0700, Tom Rix wrote:
> 
> On 9/16/22 2:06 PM, Nathan Chancellor wrote:
> > Most of the arguments are identical between the two call sites and they
> > can be accessed through the 'struct vba_vars_st' pointer. This reduces
> > the total amount of stack space that
> > dml314_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with
> > LLVM 16 (2216 -> 1976), helping clear up the following clang warning:
> > 
> >
> > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6:
> >  error: stack frame size (2216) exceeds limit (2048) in 
> > 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
> >void dml314_ModeSupportAndSystemConfigurationFull(struct 
> > display_mode_lib *mode_lib)
> > ^
> >1 error generated.
> > 
> > Link: https://github.com/ClangBuiltLinux/linux/issues/1710
> > Reported-by: "kernelci.org bot" 
> > Signed-off-by: Nathan Chancellor 
> 
> Nathan,
> 
> I like this change but I don't think it goes far enough.
> 
> There are many similar functions in this file and there other
> display_node_vba_*.c files that pass too many vba_vars_st elements.
> 
> I think most/all of the static functions should be refactored to pass
> vba_vars_st * or vba_vars_st **
> 
> fwiw, i found the calling function 
> DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation,
> hilariously long :)
> 
> I'll do the change if you want to pass this to me, I promise not to add to
> the above function name.

Right, there is definitely more that could be done here; I just picked
the couple of functions that would appear to make the most impact, as I
am only concerned with keeping the code warning free with clang so that
-Werror does not break us. I think it makes sense to take this series to
fix the warnings right now (especially since this patch has technically
already been accepted, as it was applied to dcn31) then follow up with
refactoring, which I am more than happy to let you do if you so desire
:)

Thank you for the input as always!

Cheers,
Nathan

> > ---
> > 
> > This is just commit ab2ac59c32db ("drm/amd/display: Reduce number of
> > arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()")
> > applied to dml314.
> > 
> >   .../dc/dml/dcn314/display_mode_vba_314.c  | 248 --
> >   1 file changed, 52 insertions(+), 196 deletions(-)
> > 
> > diff --git 
> > a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c 
> > b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
> > index 2829f179f982..32ceb72f7a14 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
> > @@ -325,64 +325,28 @@ static void 
> > CalculateVupdateAndDynamicMetadataParameters(
> >   static void CalculateWatermarksAndDRAMSpeedChangeSupport(
> > struct display_mode_lib *mode_lib,
> > unsigned int PrefetchMode,
> > -   unsigned int NumberOfActivePlanes,
> > -   unsigned int MaxLineBufferLines,
> > -   unsigned int LineBufferSize,
> > -   unsigned int WritebackInterfaceBufferSize,
> > double DCFCLK,
> > double ReturnBW,
> > -   bool SynchronizedVBlank,
> > -   unsigned int dpte_group_bytes[],
> > -   unsigned int MetaChunkSize,
> > double UrgentLatency,
> > double ExtraLatency,
> > -   double WritebackLatency,
> > -   double WritebackChunkSize,
> > double SOCCLK,
> > -   double DRAMClockChangeLatency,
> > -   double SRExitTime,
> > -   double SREnterPlusExitTime,
> > -   double SRExitZ8Time,
> > -   double SREnterPlusExitZ8Time,
> > double DCFCLKDeepSleep,
> > unsigned int DETBufferSizeY[],
> > unsigned int DETBufferSizeC[],
> > unsigned int SwathHeightY[],
> > unsigned int SwathHeightC[],
> > -   unsigned int LBBitPerPixel[],
> > double SwathWidthY[],
> > double SwathWidthC[],
> > -   double HRatio[],
> > -   double HRatioChroma[],
> > -   unsigned int vtaps[],
> > -   unsigned int VTAPsChroma[],
> > -   double VRatio[],
> > -   double VRatioChroma[],
> > -   unsigned int HTotal[],
> > -   double PixelClock[],
> > -   unsigned int BlendingAndTiming[],
> > unsigned int DPPPerPlane[],
> > double BytePerPixelDETY[],
> > double BytePerPixelDETC[],
> > -   double DSTXAfterScaler[],
> > -   double DSTYAfterScaler[],
> > -   bool WritebackEnable[],
> > -   enum source_format_class WritebackPixelFormat[],
> > -   double WritebackDestinationWidth[],
> > -   double WritebackDestinationHeight[],
> > -   double 

[PATCH 2/2] drm/amd/display: Reduce number of arguments of dml314's CalculateFlipSchedule()

2022-09-16 Thread Nathan Chancellor
Most of the arguments are identical between the two call sites and they
can be accessed through the 'struct vba_vars_st' pointer. This reduces
the total amount of stack space that
dml314_ModeSupportAndSystemConfigurationFull() uses by 112 bytes with
LLVM 16 (1976 -> 1864), helping clear up the following clang warning:

  
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6:
 error: stack frame size (2216) exceeds limit (2048) in 
'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
  void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib 
*mode_lib)
   ^
  1 error generated.

Link: https://github.com/ClangBuiltLinux/linux/issues/1710
Reported-by: "kernelci.org bot" 
Signed-off-by: Nathan Chancellor 
---

This is just commit 1dbec5b4b0ef ("drm/amd/display: Reduce number of
arguments of dml31's CalculateFlipSchedule()") applied to dml314.

 .../dc/dml/dcn314/display_mode_vba_314.c  | 172 +-
 1 file changed, 47 insertions(+), 125 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
index 32ceb72f7a14..e4dfa714207a 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
@@ -265,33 +265,13 @@ static void CalculateRowBandwidth(
 
 static void CalculateFlipSchedule(
struct display_mode_lib *mode_lib,
+   unsigned int k,
double HostVMInefficiencyFactor,
double UrgentExtraLatency,
double UrgentLatency,
-   unsigned int GPUVMMaxPageTableLevels,
-   bool HostVMEnable,
-   unsigned int HostVMMaxNonCachedPageTableLevels,
-   bool GPUVMEnable,
-   double HostVMMinPageSize,
double PDEAndMetaPTEBytesPerFrame,
double MetaRowBytes,
-   double DPTEBytesPerRow,
-   double BandwidthAvailableForImmediateFlip,
-   unsigned int TotImmediateFlipBytes,
-   enum source_format_class SourcePixelFormat,
-   double LineTime,
-   double VRatio,
-   double VRatioChroma,
-   double Tno_bw,
-   bool DCCEnable,
-   unsigned int dpte_row_height,
-   unsigned int meta_row_height,
-   unsigned int dpte_row_height_chroma,
-   unsigned int meta_row_height_chroma,
-   double *DestinationLinesToRequestVMInImmediateFlip,
-   double *DestinationLinesToRequestRowInImmediateFlip,
-   double *final_flip_bw,
-   bool *ImmediateFlipSupportedForPipe);
+   double DPTEBytesPerRow);
 static double CalculateWriteBackDelay(
enum source_format_class WritebackPixelFormat,
double WritebackHRatio,
@@ -2892,33 +2872,13 @@ static void 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
for (k = 0; k < v->NumberOfActivePlanes; ++k) {
CalculateFlipSchedule(
mode_lib,
+   k,
HostVMInefficiencyFactor,
v->UrgentExtraLatency,
v->UrgentLatency,
-   v->GPUVMMaxPageTableLevels,
-   v->HostVMEnable,
-   
v->HostVMMaxNonCachedPageTableLevels,
-   v->GPUVMEnable,
-   v->HostVMMinPageSize,
v->PDEAndMetaPTEBytesFrame[k],
v->MetaRowByte[k],
-   v->PixelPTEBytesPerRow[k],
-   
v->BandwidthAvailableForImmediateFlip,
-   v->TotImmediateFlipBytes,
-   v->SourcePixelFormat[k],
-   v->HTotal[k] / v->PixelClock[k],
-   v->VRatio[k],
-   v->VRatioChroma[k],
-   v->Tno_bw[k],
-   v->DCCEnable[k],
-   v->dpte_row_height[k],
-   v->meta_row_height[k],
-   v->dpte_row_height_chroma[k],
-   v->meta_row_height_chroma[k],

[PATCH 1/2] drm/amd/display: Reduce number of arguments of dml314's CalculateWatermarksAndDRAMSpeedChangeSupport()

2022-09-16 Thread Nathan Chancellor
Most of the arguments are identical between the two call sites and they
can be accessed through the 'struct vba_vars_st' pointer. This reduces
the total amount of stack space that
dml314_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with
LLVM 16 (2216 -> 1976), helping clear up the following clang warning:

  
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6:
 error: stack frame size (2216) exceeds limit (2048) in 
'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
  void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib 
*mode_lib)
   ^
  1 error generated.

Link: https://github.com/ClangBuiltLinux/linux/issues/1710
Reported-by: "kernelci.org bot" 
Signed-off-by: Nathan Chancellor 
---

This is just commit ab2ac59c32db ("drm/amd/display: Reduce number of
arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()")
applied to dml314.

 .../dc/dml/dcn314/display_mode_vba_314.c  | 248 --
 1 file changed, 52 insertions(+), 196 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
index 2829f179f982..32ceb72f7a14 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/display_mode_vba_314.c
@@ -325,64 +325,28 @@ static void CalculateVupdateAndDynamicMetadataParameters(
 static void CalculateWatermarksAndDRAMSpeedChangeSupport(
struct display_mode_lib *mode_lib,
unsigned int PrefetchMode,
-   unsigned int NumberOfActivePlanes,
-   unsigned int MaxLineBufferLines,
-   unsigned int LineBufferSize,
-   unsigned int WritebackInterfaceBufferSize,
double DCFCLK,
double ReturnBW,
-   bool SynchronizedVBlank,
-   unsigned int dpte_group_bytes[],
-   unsigned int MetaChunkSize,
double UrgentLatency,
double ExtraLatency,
-   double WritebackLatency,
-   double WritebackChunkSize,
double SOCCLK,
-   double DRAMClockChangeLatency,
-   double SRExitTime,
-   double SREnterPlusExitTime,
-   double SRExitZ8Time,
-   double SREnterPlusExitZ8Time,
double DCFCLKDeepSleep,
unsigned int DETBufferSizeY[],
unsigned int DETBufferSizeC[],
unsigned int SwathHeightY[],
unsigned int SwathHeightC[],
-   unsigned int LBBitPerPixel[],
double SwathWidthY[],
double SwathWidthC[],
-   double HRatio[],
-   double HRatioChroma[],
-   unsigned int vtaps[],
-   unsigned int VTAPsChroma[],
-   double VRatio[],
-   double VRatioChroma[],
-   unsigned int HTotal[],
-   double PixelClock[],
-   unsigned int BlendingAndTiming[],
unsigned int DPPPerPlane[],
double BytePerPixelDETY[],
double BytePerPixelDETC[],
-   double DSTXAfterScaler[],
-   double DSTYAfterScaler[],
-   bool WritebackEnable[],
-   enum source_format_class WritebackPixelFormat[],
-   double WritebackDestinationWidth[],
-   double WritebackDestinationHeight[],
-   double WritebackSourceHeight[],
bool UnboundedRequestEnabled,
unsigned int CompressedBufferSizeInkByte,
enum clock_change_support *DRAMClockChangeSupport,
-   double *UrgentWatermark,
-   double *WritebackUrgentWatermark,
-   double *DRAMClockChangeWatermark,
-   double *WritebackDRAMClockChangeWatermark,
double *StutterExitWatermark,
double *StutterEnterPlusExitWatermark,
double *Z8StutterExitWatermark,
-   double *Z8StutterEnterPlusExitWatermark,
-   double *MinActiveDRAMClockChangeLatencySupported);
+   double *Z8StutterEnterPlusExitWatermark);
 
 static void CalculateDCFCLKDeepSleep(
struct display_mode_lib *mode_lib,
@@ -3041,64 +3005,28 @@ static void 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
CalculateWatermarksAndDRAMSpeedChangeSupport(
mode_lib,
PrefetchMode,
-   v->NumberOfActivePlanes,
-   v->MaxLineBufferLines,
-   v->LineBufferSize,
-   v->WritebackInterfaceBufferSize,
v->DCFCLK,
v->ReturnBW,
- 

Re: Final gang submit patches

2022-09-16 Thread Alex Deucher
On Fri, Sep 16, 2022 at 5:08 AM Christian König
 wrote:
>
> Hey guys,
>
> so thanks to Ruijing I was finally able to hammer out all the known VCN
> regressions from this patch set.
>
> Alex can you review those? AFAIK you are pretty much the only other
> person deep enough in the CS IOCTL for that.

These look good to me.  Just a couple of comments on some of the
patches.  We probably also want to patch to bump the driver version so
userspace knows when this is available.  With those addressed, the
series is:
Reviewed-by: Alex Deucher 

>
> Going to takle the userptr issue Bas stumbled over next.
>
> Thanks,
> Christian.
>
>


Re: [PATCH 8/8] drm/amdgpu: add gang submit frontend v5

2022-09-16 Thread Alex Deucher
On Fri, Sep 16, 2022 at 5:09 AM Christian König
 wrote:
>
> Allows submitting jobs as gang which needs to run on multiple engines at the
> same time.
>
> All members of the gang get the same implicit, explicit and VM dependencies. 
> So
> no gang member will start running until everything else is ready.
>
> The last job is considered the gang leader (usually a submission to the GFX
> ring) and used for signaling output dependencies.
>
> Each job is remembered individually as user of a buffer object, so there is no
> joining of work at the end.
>
> v2: rebase and fix review comments from Andrey and Yogesh
> v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang
> leader only when necessary
> v4: fix order of pushing jobs and adding fences found by Trigger.
> v5: fix job index calculation and adding IBs to jobs
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 269 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h |  10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  |  12 +-
>  5 files changed, 195 insertions(+), 99 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 4f5bd96000ec..c7b1a2dfde13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -686,6 +686,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
> ib->length_dw = ib_len;
> /* This works for NO_HWS. TODO: need to handle without knowing VMID */
> job->vmid = vmid;
> +   job->num_ibs = 1;
>
> ret = amdgpu_ib_schedule(ring, 1, ib, job, );
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 58088c663125..964052377991 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -64,11 +64,11 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
> *p,
> return 0;
>  }
>
> -static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
> -  struct drm_amdgpu_cs_chunk_ib *chunk_ib,
> -  unsigned int *num_ibs)
> +static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p,
> +struct drm_amdgpu_cs_chunk_ib *chunk_ib)
>  {
> struct drm_sched_entity *entity;
> +   unsigned int i;
> int r;
>
> r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
> @@ -77,17 +77,38 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
> if (r)
> return r;
>
> -   /* Abort if there is no run queue associated with this entity.
> -* Possibly because of disabled HW IP*/
> +   /*
> +* Abort if there is no run queue associated with this entity.
> +* Possibly because of disabled HW IP.
> +*/
> if (entity->rq == NULL)
> return -EINVAL;
>
> -   /* Currently we don't support submitting to multiple entities */
> -   if (p->entity && p->entity != entity)
> +   /* Check if we can add this IB to some existing job */
> +   for (i = 0; i < p->gang_size; ++i)
> +   if (p->entities[i] == entity)
> +   return i;
> +
> +   /* If not increase the gang size if possible */
> +   if (i == AMDGPU_CS_GANG_SIZE)
> return -EINVAL;
>
> -   p->entity = entity;
> -   ++(*num_ibs);
> +   p->entities[i] = entity;
> +   p->gang_size = i + 1;
> +   return i;
> +}
> +
> +static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
> +  struct drm_amdgpu_cs_chunk_ib *chunk_ib,
> +  unsigned int *num_ibs)
> +{
> +   int r;
> +
> +   r = amdgpu_cs_job_idx(p, chunk_ib);
> +   if (r < 0)
> +   return r;
> +
> +   ++(num_ibs[r]);
> return 0;
>  }
>
> @@ -161,11 +182,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
>union drm_amdgpu_cs *cs)
>  {
> struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
> +   unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { };
> struct amdgpu_vm *vm = >vm;
> uint64_t *chunk_array_user;
> uint64_t *chunk_array;
> -   unsigned size, num_ibs = 0;
> uint32_t uf_offset = 0;
> +   unsigned int size;
> int ret;
> int i;
>
> @@ -228,7 +250,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
> if (size < sizeof(struct drm_amdgpu_cs_chunk_ib))
> goto free_partial_kdata;
>
> -   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, 
> _ibs);
> +   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs);
> if (ret)
> goto 

Re: [PATCH 3/8] drm/amdgpu: fix user fence CS check

2022-09-16 Thread Christian König

Am 16.09.22 um 20:01 schrieb Alex Deucher:

On Fri, Sep 16, 2022 at 5:09 AM Christian König
 wrote:

An offset of zero is valid, check if the BO is present or not.

Signed-off-by: Christian König 

Since this is a bug fix, should we make the first patch in the series?


Good point, going to re-arrange this before pushing.

Christian.



Alex


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5c37dde97ff2..265ed2118a80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,

 ring = to_amdgpu_ring(entity->rq->sched);
 /* MM engine doesn't support user fences */
-   if (p->job->uf_addr && ring->funcs->no_user_fence)
+   if (p->uf_entry.tv.bo && ring->funcs->no_user_fence)
 return -EINVAL;

 if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
--
2.25.1





Re: [PATCH 3/8] drm/amdgpu: fix user fence CS check

2022-09-16 Thread Alex Deucher
On Fri, Sep 16, 2022 at 5:09 AM Christian König
 wrote:
>
> An offset of zero is valid, check if the BO is present or not.
>
> Signed-off-by: Christian König 

Since this is a bug fix, should we make the first patch in the series?

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 5c37dde97ff2..265ed2118a80 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
>
> ring = to_amdgpu_ring(entity->rq->sched);
> /* MM engine doesn't support user fences */
> -   if (p->job->uf_addr && ring->funcs->no_user_fence)
> +   if (p->uf_entry.tv.bo && ring->funcs->no_user_fence)
> return -EINVAL;
>
> if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
> --
> 2.25.1
>


Re: [PATCH 7/7] drm/amdgpu: Fix amdgpu_vm_pt_free warning

2022-09-16 Thread Christian König

Am 16.09.22 um 17:05 schrieb Philip Yang:

Free page table BO from vm resv unlocked context generate below
warnings.

Add a pt_free_work in vm to free page table BO from vm->pt_freed list.
pass vm resv unlock status from page table update caller, and add vm_bo
entry to vm->pt_freed list and schedule the pt_free_work if calling with
vm resv unlocked.

WARNING: CPU: 12 PID: 3238 at
drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0
Call Trace:
  amdgpu_vm_pt_free+0x42/0xd0 [amdgpu]
  amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu]
  amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu]
  amdgpu_vm_update_range+0x2a6/0x640 [amdgpu]
  svm_range_unmap_from_gpus+0x110/0x300 [amdgpu]
  svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu]
  __mmu_notifier_invalidate_range_start+0x1cd/0x230
  unmap_vmas+0x9d/0x140
  unmap_region+0xa8/0x110

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  5 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 +--
  3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9d7f9b3670d7..3402fb62b4ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2057,6 +2057,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
spin_lock_init(>status_lock);
INIT_LIST_HEAD(>freed);
INIT_LIST_HEAD(>done);
+   INIT_LIST_HEAD(>pt_freed);
+   INIT_WORK(>pt_free_work, amdgpu_vm_pt_free_work);
  
  	/* create scheduler entities for page table updates */

r = drm_sched_entity_init(>immediate, DRM_SCHED_PRIORITY_NORMAL,
@@ -2258,6 +2260,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
  
  	amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
  
+	flush_work(>pt_free_work);

+
root = amdgpu_bo_ref(vm->root.bo);
amdgpu_bo_reserve(root, true);
amdgpu_vm_set_pasid(adev, vm, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98895c8fef6c..2275c692d1ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -278,6 +278,10 @@ struct amdgpu_vm {
/* BOs which are invalidated, has been updated in the PTs */
struct list_headdone;
  
+	/* PT BOs scheduled to free and fill with zero if vm_resv is not hold */

+   struct list_headpt_freed;
+   struct work_struct  pt_free_work;
+
/* contains the page directory */
struct amdgpu_vm_bo_base root;
struct dma_fence*last_update;
@@ -473,6 +477,7 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params 
*params,
  int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
+void amdgpu_vm_pt_free_work(struct work_struct *work);
  
  #if defined(CONFIG_DEBUG_FS)

  void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 61a4b7182d44..25277380ef5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -644,6 +644,27 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
amdgpu_bo_unref(>bo);
  }
  
+void amdgpu_vm_pt_free_work(struct work_struct *work)

+{
+   struct amdgpu_vm_bo_base *entry, *next;
+   struct amdgpu_vm *vm;
+   LIST_HEAD(pt_freed);
+
+   vm = container_of(work, struct amdgpu_vm, pt_free_work);
+
+   spin_lock(>status_lock);
+   list_splice_init(>pt_freed, _freed);
+   spin_unlock(>status_lock);
+
+   // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid


Please no // style comments in kernel code.

Apart from that the whole series looks really good to me.

Thanks,
Christian.


+   amdgpu_bo_reserve(vm->root.bo, true);
+
+   list_for_each_entry_safe(entry, next, _freed, vm_status)
+   amdgpu_vm_pt_free(entry);
+
+   amdgpu_bo_unreserve(vm->root.bo);
+}
+
  /**
   * amdgpu_vm_pt_free_dfs - free PD/PT levels
   *
@@ -655,11 +676,24 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
   */
  static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
  struct amdgpu_vm *vm,
- struct amdgpu_vm_pt_cursor *start)
+ struct amdgpu_vm_pt_cursor *start,
+ bool unlocked)
  {
struct amdgpu_vm_pt_cursor cursor;
struct amdgpu_vm_bo_base *entry;
  
+	if (unlocked) {

+   spin_lock(>status_lock);
+   for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry)
+   list_move(>vm_status, 

Re: [PATCH 1/7] drm/amdgpu: Rename vm invalidate lock to status_lock

2022-09-16 Thread Christian König




Am 16.09.22 um 17:05 schrieb Philip Yang:

The vm status_lock will be used to protect all vm status lists.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 30 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 +++-
  2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 690fd4f639f1..596f1ea8babc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -225,9 +225,9 @@ static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base 
*vm_bo)
   */
  static void amdgpu_vm_bo_invalidated(struct amdgpu_vm_bo_base *vm_bo)
  {
-   spin_lock(_bo->vm->invalidated_lock);
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->invalidated);
-   spin_unlock(_bo->vm->invalidated_lock);
+   spin_unlock(_bo->vm->status_lock);
  }
  
  /**

@@ -256,9 +256,9 @@ static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base 
*vm_bo)
   */
  static void amdgpu_vm_bo_done(struct amdgpu_vm_bo_base *vm_bo)
  {
-   spin_lock(_bo->vm->invalidated_lock);
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->done);
-   spin_unlock(_bo->vm->invalidated_lock);
+   spin_unlock(_bo->vm->status_lock);
  }
  
  /**

@@ -936,7 +936,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
amdgpu_bo_get_memory(bo_va->base.bo, vram_mem,
gtt_mem, cpu_mem);
}
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
@@ -949,7 +949,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
amdgpu_bo_get_memory(bo_va->base.bo, vram_mem,
gtt_mem, cpu_mem);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
  }
  /**
   * amdgpu_vm_bo_update - update all BO mappings in the vm page table
@@ -1290,12 +1290,12 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
return r;
}
  
-	spin_lock(>invalidated_lock);

+   spin_lock(>status_lock);
while (!list_empty(>invalidated)) {
bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va,
 base.vm_status);
resv = bo_va->base.bo->tbo.base.resv;
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
  
  		/* Try to reserve the BO to avoid clearing its ptes */

if (!amdgpu_vm_debug && dma_resv_trylock(resv))
@@ -1310,9 +1310,9 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
  
  		if (!clear)

dma_resv_unlock(resv);
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
  
  	return 0;

  }
@@ -1763,9 +1763,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
}
}
  
-	spin_lock(>invalidated_lock);

+   spin_lock(>status_lock);
list_del(_va->base.vm_status);
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
  
  	list_for_each_entry_safe(mapping, next, _va->valids, list) {

list_del(>list);
@@ -2019,7 +2019,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
INIT_LIST_HEAD(>moved);
INIT_LIST_HEAD(>idle);
INIT_LIST_HEAD(>invalidated);
-   spin_lock_init(>invalidated_lock);
+   spin_lock_init(>status_lock);
INIT_LIST_HEAD(>freed);
INIT_LIST_HEAD(>done);
  
@@ -2584,7 +2584,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m)

id = 0;
  
  	seq_puts(m, "\tInvalidated BOs:\n");

-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
@@ -2599,7 +2599,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, 
struct seq_file *m)
continue;
total_done += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
total_done_objs = id;
  
  	seq_printf(m, "\tTotal idle size:%12lld\tobjs:\t%d\n", total_idle,

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 9ecb7f663e19..98895c8fef6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -268,7 +268,9 @@ struct amdgpu_vm {
  
  	/* regular invalidated BOs, but not yet updated in the PT */

struct 

Re: [PATCH -next 1/6] drm/amd/display: clean up some inconsistent indentings

2022-09-16 Thread Alex Deucher
Applied the series.

Thanks,

Alex

On Wed, Sep 14, 2022 at 9:58 PM Yang Li  wrote:
>
> clean up some inconsistent indentings
>
> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2177
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 
> ---
>  .../gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c| 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
> index 4e3356d12147..8dfe639b6508 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_mode_vba_31.c
> @@ -1051,10 +1051,10 @@ static bool CalculatePrefetchSchedule(
> bytes_pp = myPipe->BytePerPixelY + myPipe->BytePerPixelC;
> /*rev 99*/
> prefetch_bw_pr = dml_min(1, bytes_pp * myPipe->PixelClock / (double) 
> myPipe->DPPPerPlane);
> -max_Tsw = dml_max(PrefetchSourceLinesY, PrefetchSourceLinesC) * LineTime;
> +   max_Tsw = dml_max(PrefetchSourceLinesY, PrefetchSourceLinesC) * 
> LineTime;
> prefetch_sw_bytes = PrefetchSourceLinesY * swath_width_luma_ub * 
> myPipe->BytePerPixelY + PrefetchSourceLinesC * swath_width_chroma_ub * 
> myPipe->BytePerPixelC;
> prefetch_bw_oto = dml_max(bytes_pp * myPipe->PixelClock / 
> myPipe->DPPPerPlane, prefetch_sw_bytes / (dml_max(PrefetchSourceLinesY, 
> PrefetchSourceLinesC) * LineTime));
> -prefetch_bw_oto = dml_max(prefetch_bw_pr, prefetch_sw_bytes / max_Tsw);
> +   prefetch_bw_oto = dml_max(prefetch_bw_pr, prefetch_sw_bytes / 
> max_Tsw);
>
> min_Lsw = dml_max(1, dml_max(PrefetchSourceLinesY, 
> PrefetchSourceLinesC) / max_vratio_pre);
> Lsw_oto = dml_ceil(4 * dml_max(prefetch_sw_bytes / prefetch_bw_oto / 
> LineTime, min_Lsw), 1) / 4;
> --
> 2.20.1.7.g153144c
>


Re: [PATCH][next] drm/amdkfd: Fix spelling mistake "detroyed" -> "destroyed"

2022-09-16 Thread Alex Deucher
Applied.  Thanks!

On Wed, Sep 14, 2022 at 1:15 PM Colin Ian King  wrote:
>
> There is a spelling mistake in a pr_debug message. Fix it.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 7522bf2d2f57..c70c026c9a93 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -898,7 +898,7 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
> return VM_FAULT_SIGBUS;
> }
> if (!mmget_not_zero(svm_bo->eviction_fence->mm)) {
> -   pr_debug("addr 0x%lx of process mm is detroyed\n", addr);
> +   pr_debug("addr 0x%lx of process mm is destroyed\n", addr);
> return VM_FAULT_SIGBUS;
> }
>
> --
> 2.37.1
>


[pull] amdgpu, amdkfd, radeon drm-next-6.1

2022-09-16 Thread Alex Deucher
Hi Dave, Daniel,

Updates for 6.1.

The following changes since commit 213cb76ddc8b875e772f9f4d173feefa122716af:

  Merge tag 'drm-intel-gt-next-2022-09-09' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-next (2022-09-12 21:12:23 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-6.1-2022-09-16

for you to fetch changes up to dacd2d2d9d800b7ab2ee2734578112532cba8105:

  drm/amd/display: fix boolconv.cocci warning (2022-09-14 12:38:53 -0400)


amd-drm-next-6.1-2022-09-16:

amdgpu:
- PCI AER fixes
- BACO fix
- RAS fixes
- XGMI fixes
- Display SubVP fixes
- DCN 3.2 updates
- DCN 3.1.4 updates
- LLVM fixes
- CS cleanup in preparation for gang submit
- Add some new GC CG registers
- Misc cleanups

amdkfd:
- Fix CRIU regression
- CPU fault on COW mapping fixes
- Prefault fixes
- Misc cleanups

radeon:
- Misc cleanups


Alex Deucher (5):
  drm/amdgpu: add HDP remap functionality to nbio 7.7
  drm/amdgpu: fix warning about missing imu prototype
  drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
  drm/amdgpu: make sure to init common IP before gmc

Alvin Lee (4):
  drm/amd/display: Update MBLK calculation for SubVP
  drm/amd/display: SW cursor fallback for SubVP
  drm/amd/display: Update viewport position for phantom pipes
  drm/amd/display: Refactor SubVP calculation to remove FPU

Aric Cyr (4):
  drm/amd/display: Optimizations for DML math
  drm/amd/display: Fix divide by zero in DML
  drm/amd/display: Don't adjust VRR unnecessarily
  drm/amd/display: 3.2.202

Aurabindo Pillai (3):
  drm/amd/display: Revert "Fallback to SW cursor if SubVP + cursor too big"
  drm/amd/display: add workaround for subvp cursor corruption for DCN32/321
  drm/amd/display: Fix register definitions for DCN32/321

Candice Li (4):
  drm/amdgpu: Rely on MCUMC_STATUS for umc v8_10 correctable error counter 
only
  drm/amdgpu: Enable full reset when RAS is supported on gc v11_0_0
  drm/amdgpu: Add EEPROM I2C address for smu v13_0_0
  drm/amdgpu: Skip reset error status for psp v13_0_0

Charlene Liu (1):
  drm/amd/display: update header files

Chris Park (1):
  drm/amd/display: Correct I2C register offset

Christian König (6):
  drm/amdgpu: remove SRIOV and MCBP dependencies from the CS
  drm/amdgpu: move setting the job resources
  drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
  drm/amdgpu: use DMA_RESV_USAGE_BOOKKEEP v2
  drm/amdgpu: cleanup CS init/fini and pass1
  drm/amdgpu: reorder CS code

Duncan Ma (1):
  drm/amd/display: Correct dram channel width for dcn314

Guchun Chen (1):
  drm/amd/pm: disable BACO entry/exit completely on several sienna cichlid 
cards

Hamza Mahfooz (1):
  drm/amdgpu: use dirty framebuffer helper

JeromeHong (1):
  drm/amd/display: Avoid force minimal transaction in case of surface_count 
equal to 0

Jiapeng Chong (13):
  drm/amd/display: make enc32_stream_encoder_dvi_set_stream_attribute static
  drm/amd/display: make dscl32_calc_lb_num_partitions static
  drm/amd/display: make dcn32_link_encoder_is_in_alt_mode and 
dcn32_link_encoder_get_max_link_cap static
  drm/amd/display: make mpc32_program_shaper and mpc32_program_3dlut static
  drm/amd/display: Remove the unused function copy_stream_update_to_stream()
  drm/amd/display: make some functions static
  drm/amd/display: make enc314_stream_encoder_dvi_set_stream_attribute 
static
  drm/amd/display: make optc314_phantom_crtc_post_enable static
  drm/amd/display: make some functions static
  drm/amd/display: make mmhubbub32_config_mcif_buf static
  drm/amd/display: make some functions static
  drm/amd/display: make some functions static
  drm/amd/display: make optc32_phantom_crtc_post_enable, 
optc32_setup_manual_trigger and optc32_set_drr static

Jilin Yuan (1):
  drm/amd/display: fix repeated words in comments

JinZe.Xu (1):
  drm/amd/display: Assign link type before check dsc workaround.

Jingyu Wang (5):
  drm/amdgpu: cleanup coding style in amdgpu_acpi.c
  drm/amdgpu: cleanup coding style in amdgpu_sync.c file
  drm/amdgpu: cleanup coding style in amdgpu_amdkfd.c
  drm/amdgpu: cleanup coding style in amdgpu_amdkfd_gpuvm.c
  drm/amdgpu/display: remove unneeded "default n" options

John Clements (1):
  drm/amdgpu: added support for ras driver loading

Lee, Alvin (1):
  drm/amd/display: Added debug option for forcing subvp num ways

Leo Chen (2):
  drm/amd/display: Fixing DIG FIFO Error
  drm/amd/display: Adding log for spread_spectrum_info

Li Ma (1):
  drm/amd/pm:add new gpu_metrics_v2_3 to acquire average temperature info

Lijo Lazar 

[PATCH 3/7] drm/amdgpu: Use vm status_lock to protect vm idle list

2022-09-16 Thread Philip Yang
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4a1cb20deb2d..c3412709e626 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -211,7 +211,9 @@ static void amdgpu_vm_bo_moved(struct amdgpu_vm_bo_base 
*vm_bo)
  */
 static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base *vm_bo)
 {
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->idle);
+   spin_unlock(_bo->vm->status_lock);
vm_bo->moved = false;
 }
 
@@ -2554,6 +2556,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, 
struct seq_file *m)
unsigned int total_done_objs = 0;
unsigned int id = 0;
 
+   spin_lock(>status_lock);
seq_puts(m, "\tIdle BOs:\n");
list_for_each_entry_safe(bo_va, tmp, >idle, base.vm_status) {
if (!bo_va->base.bo)
@@ -2591,7 +2594,6 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, 
struct seq_file *m)
id = 0;
 
seq_puts(m, "\tInvalidated BOs:\n");
-   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
-- 
2.35.1



[PATCH 2/7] drm/amdgpu: Use vm status_lock to protect relocated list

2022-09-16 Thread Philip Yang
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 596f1ea8babc..4a1cb20deb2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -240,10 +240,13 @@ static void amdgpu_vm_bo_invalidated(struct 
amdgpu_vm_bo_base *vm_bo)
  */
 static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base *vm_bo)
 {
-   if (vm_bo->bo->parent)
+   if (vm_bo->bo->parent) {
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->relocated);
-   else
+   spin_unlock(_bo->vm->status_lock);
+   } else {
amdgpu_vm_bo_idle(vm_bo);
+   }
 }
 
 /**
@@ -680,9 +683,14 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
struct amdgpu_vm_update_params params;
struct amdgpu_vm_bo_base *entry;
bool flush_tlb_needed = false;
+   LIST_HEAD(relocated);
int r, idx;
 
-   if (list_empty(>relocated))
+   spin_lock(>status_lock);
+   list_splice_init(>relocated, );
+   spin_unlock(>status_lock);
+
+   if (list_empty())
return 0;
 
if (!drm_dev_enter(adev_to_drm(adev), ))
@@ -697,7 +705,7 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
if (r)
goto error;
 
-   list_for_each_entry(entry, >relocated, vm_status) {
+   list_for_each_entry(entry, , vm_status) {
/* vm_flush_needed after updating moved PDEs */
flush_tlb_needed |= entry->moved;
 
@@ -713,9 +721,8 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
if (flush_tlb_needed)
atomic64_inc(>tlb_seq);
 
-   while (!list_empty(>relocated)) {
-   entry = list_first_entry(>relocated,
-struct amdgpu_vm_bo_base,
+   while (!list_empty()) {
+   entry = list_first_entry(, struct amdgpu_vm_bo_base,
 vm_status);
amdgpu_vm_bo_idle(entry);
}
@@ -912,6 +919,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
 {
struct amdgpu_bo_va *bo_va, *tmp;
 
+   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >idle, base.vm_status) {
if (!bo_va->base.bo)
continue;
@@ -936,7 +944,6 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
amdgpu_bo_get_memory(bo_va->base.bo, vram_mem,
gtt_mem, cpu_mem);
}
-   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
-- 
2.35.1



[PATCH 4/7] drm/amdgpu: Use vm status_lock to protect vm moved list

2022-09-16 Thread Philip Yang
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c3412709e626..d5d170de51ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -198,7 +198,9 @@ static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base 
*vm_bo)
  */
 static void amdgpu_vm_bo_moved(struct amdgpu_vm_bo_base *vm_bo)
 {
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->moved);
+   spin_unlock(_bo->vm->status_lock);
 }
 
 /**
@@ -1287,19 +1289,24 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
   struct amdgpu_vm *vm)
 {
-   struct amdgpu_bo_va *bo_va, *tmp;
+   struct amdgpu_bo_va *bo_va;
struct dma_resv *resv;
bool clear;
int r;
 
-   list_for_each_entry_safe(bo_va, tmp, >moved, base.vm_status) {
+   spin_lock(>status_lock);
+   while (!list_empty(>moved)) {
+   bo_va = list_first_entry(>moved, struct amdgpu_bo_va,
+base.vm_status);
+   spin_unlock(>status_lock);
+
/* Per VM BOs never need to bo cleared in the page tables */
r = amdgpu_vm_bo_update(adev, bo_va, false);
if (r)
return r;
+   spin_lock(>status_lock);
}
 
-   spin_lock(>status_lock);
while (!list_empty(>invalidated)) {
bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va,
 base.vm_status);
@@ -1396,7 +1403,9 @@ static void amdgpu_vm_bo_insert_map(struct amdgpu_device 
*adev,
 
if (bo && bo->tbo.base.resv == vm->root.bo->tbo.base.resv &&
!bo_va->base.moved) {
+   spin_lock(>status_lock);
list_move(_va->base.vm_status, >moved);
+   spin_unlock(>status_lock);
}
trace_amdgpu_vm_bo_map(bo_va, mapping);
 }
-- 
2.35.1



[PATCH 7/7] drm/amdgpu: Fix amdgpu_vm_pt_free warning

2022-09-16 Thread Philip Yang
Free page table BO from vm resv unlocked context generate below
warnings.

Add a pt_free_work in vm to free page table BO from vm->pt_freed list.
pass vm resv unlock status from page table update caller, and add vm_bo
entry to vm->pt_freed list and schedule the pt_free_work if calling with
vm resv unlocked.

WARNING: CPU: 12 PID: 3238 at
drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0
Call Trace:
 amdgpu_vm_pt_free+0x42/0xd0 [amdgpu]
 amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu]
 amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu]
 amdgpu_vm_update_range+0x2a6/0x640 [amdgpu]
 svm_range_unmap_from_gpus+0x110/0x300 [amdgpu]
 svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu]
 __mmu_notifier_invalidate_range_start+0x1cd/0x230
 unmap_vmas+0x9d/0x140
 unmap_region+0xa8/0x110

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  5 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 +--
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9d7f9b3670d7..3402fb62b4ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2057,6 +2057,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
spin_lock_init(>status_lock);
INIT_LIST_HEAD(>freed);
INIT_LIST_HEAD(>done);
+   INIT_LIST_HEAD(>pt_freed);
+   INIT_WORK(>pt_free_work, amdgpu_vm_pt_free_work);
 
/* create scheduler entities for page table updates */
r = drm_sched_entity_init(>immediate, DRM_SCHED_PRIORITY_NORMAL,
@@ -2258,6 +2260,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
 
amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
 
+   flush_work(>pt_free_work);
+
root = amdgpu_bo_ref(vm->root.bo);
amdgpu_bo_reserve(root, true);
amdgpu_vm_set_pasid(adev, vm, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 98895c8fef6c..2275c692d1ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -278,6 +278,10 @@ struct amdgpu_vm {
/* BOs which are invalidated, has been updated in the PTs */
struct list_headdone;
 
+   /* PT BOs scheduled to free and fill with zero if vm_resv is not hold */
+   struct list_headpt_freed;
+   struct work_struct  pt_free_work;
+
/* contains the page directory */
struct amdgpu_vm_bo_base root;
struct dma_fence*last_update;
@@ -473,6 +477,7 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params 
*params,
 int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
+void amdgpu_vm_pt_free_work(struct work_struct *work);
 
 #if defined(CONFIG_DEBUG_FS)
 void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 61a4b7182d44..25277380ef5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -644,6 +644,27 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
amdgpu_bo_unref(>bo);
 }
 
+void amdgpu_vm_pt_free_work(struct work_struct *work)
+{
+   struct amdgpu_vm_bo_base *entry, *next;
+   struct amdgpu_vm *vm;
+   LIST_HEAD(pt_freed);
+
+   vm = container_of(work, struct amdgpu_vm, pt_free_work);
+
+   spin_lock(>status_lock);
+   list_splice_init(>pt_freed, _freed);
+   spin_unlock(>status_lock);
+
+   // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid
+   amdgpu_bo_reserve(vm->root.bo, true);
+
+   list_for_each_entry_safe(entry, next, _freed, vm_status)
+   amdgpu_vm_pt_free(entry);
+
+   amdgpu_bo_unreserve(vm->root.bo);
+}
+
 /**
  * amdgpu_vm_pt_free_dfs - free PD/PT levels
  *
@@ -655,11 +676,24 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
  */
 static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
  struct amdgpu_vm *vm,
- struct amdgpu_vm_pt_cursor *start)
+ struct amdgpu_vm_pt_cursor *start,
+ bool unlocked)
 {
struct amdgpu_vm_pt_cursor cursor;
struct amdgpu_vm_bo_base *entry;
 
+   if (unlocked) {
+   spin_lock(>status_lock);
+   for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry)
+   list_move(>vm_status, >pt_freed);
+
+   if (start)
+   list_move(>entry->vm_status, >pt_freed);
+   spin_unlock(>status_lock);
+   

[PATCH 6/7] drm/amdgpu: Use vm status_lock to protect pt free

2022-09-16 Thread Philip Yang
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 88de9f0d4728..61a4b7182d44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -637,7 +637,10 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
}
ttm_bo_set_bulk_move(>bo->tbo, NULL);
entry->bo->vm_bo = NULL;
+
+   spin_lock(>vm->status_lock);
list_del(>vm_status);
+   spin_unlock(>vm->status_lock);
amdgpu_bo_unref(>bo);
 }
 
-- 
2.35.1



[PATCH 0/7] Fix amdgpu_vm_pt_free warning

2022-09-16 Thread Philip Yang
When SVM range is unmapped from CPU, the mmu notifier callback update 
GPU page table to unmap the SVM range from GPU, this is the unlocked 
page table update context as we cannot take vm reservation lock. If 
unmapping from GPU free pt bo, this cause warning as vm reservation lock 
is not hold.

To fix the warning/race issue, we schedule pt_free_work to free pt bo. 
This has another race to remove vm_bo entry from vm status lists. Some 
of the vm status lists are protected by vm invalidate lock and the rest 
of lists are protected by vm reservation lock.

The purpose of this patch series is to use one vm status_lock to protect 
all vm status lists, and use it in pt_free_work to fix the race and 
remove the warning.

Philip Yang (7):
  drm/amdgpu: Rename vm invalidate lock to status_lock
  drm/amdgpu: Use vm status_lock to protect relocated list
  drm/amdgpu: Use vm status_lock to protect vm idle list
  drm/amdgpu: Use vm status_lock to protect vm moved list
  drm/amdgpu: Use vm status_lock to protect vm evicted list
  drm/amdgpu: Use vm status_lock to protect pt free
  drm/amdgpu: Fix amdgpu_vm_pt_free warning

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 97 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  9 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 44 +-
 3 files changed, 117 insertions(+), 33 deletions(-)

-- 
2.35.1



[PATCH 5/7] drm/amdgpu: Use vm status_lock to protect vm evicted list

2022-09-16 Thread Philip Yang
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 27 +-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index d5d170de51ce..9d7f9b3670d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -183,10 +183,12 @@ static void amdgpu_vm_bo_evicted(struct amdgpu_vm_bo_base 
*vm_bo)
struct amdgpu_bo *bo = vm_bo->bo;
 
vm_bo->moved = true;
+   spin_lock(_bo->vm->status_lock);
if (bo->tbo.type == ttm_bo_type_kernel)
list_move(_bo->vm_status, >evicted);
else
list_move_tail(_bo->vm_status, >evicted);
+   spin_unlock(_bo->vm->status_lock);
 }
 /**
  * amdgpu_vm_bo_moved - vm_bo is moved
@@ -370,12 +372,20 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
  int (*validate)(void *p, struct amdgpu_bo *bo),
  void *param)
 {
-   struct amdgpu_vm_bo_base *bo_base, *tmp;
+   struct amdgpu_vm_bo_base *bo_base;
+   struct amdgpu_bo *shadow;
+   struct amdgpu_bo *bo;
int r;
 
-   list_for_each_entry_safe(bo_base, tmp, >evicted, vm_status) {
-   struct amdgpu_bo *bo = bo_base->bo;
-   struct amdgpu_bo *shadow = amdgpu_bo_shadowed(bo);
+   spin_lock(>status_lock);
+   while (!list_empty(>evicted)) {
+   bo_base = list_first_entry(>evicted,
+  struct amdgpu_vm_bo_base,
+  vm_status);
+   spin_unlock(>status_lock);
+
+   bo = bo_base->bo;
+   shadow = amdgpu_bo_shadowed(bo);
 
r = validate(param, bo);
if (r)
@@ -392,7 +402,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
vm->update_funcs->map_table(to_amdgpu_bo_vm(bo));
amdgpu_vm_bo_relocated(bo_base);
}
+   spin_lock(>status_lock);
}
+   spin_unlock(>status_lock);
 
amdgpu_vm_eviction_lock(vm);
vm->evicting = false;
@@ -413,13 +425,18 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
  */
 bool amdgpu_vm_ready(struct amdgpu_vm *vm)
 {
+   bool empty;
bool ret;
 
amdgpu_vm_eviction_lock(vm);
ret = !vm->evicting;
amdgpu_vm_eviction_unlock(vm);
 
-   return ret && list_empty(>evicted);
+   spin_lock(>status_lock);
+   empty = list_empty(>evicted);
+   spin_unlock(>status_lock);
+
+   return ret && empty;
 }
 
 /**
-- 
2.35.1



[PATCH 1/7] drm/amdgpu: Rename vm invalidate lock to status_lock

2022-09-16 Thread Philip Yang
The vm status_lock will be used to protect all vm status lists.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 30 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 +++-
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 690fd4f639f1..596f1ea8babc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -225,9 +225,9 @@ static void amdgpu_vm_bo_idle(struct amdgpu_vm_bo_base 
*vm_bo)
  */
 static void amdgpu_vm_bo_invalidated(struct amdgpu_vm_bo_base *vm_bo)
 {
-   spin_lock(_bo->vm->invalidated_lock);
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->invalidated);
-   spin_unlock(_bo->vm->invalidated_lock);
+   spin_unlock(_bo->vm->status_lock);
 }
 
 /**
@@ -256,9 +256,9 @@ static void amdgpu_vm_bo_relocated(struct amdgpu_vm_bo_base 
*vm_bo)
  */
 static void amdgpu_vm_bo_done(struct amdgpu_vm_bo_base *vm_bo)
 {
-   spin_lock(_bo->vm->invalidated_lock);
+   spin_lock(_bo->vm->status_lock);
list_move(_bo->vm_status, _bo->vm->done);
-   spin_unlock(_bo->vm->invalidated_lock);
+   spin_unlock(_bo->vm->status_lock);
 }
 
 /**
@@ -936,7 +936,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
amdgpu_bo_get_memory(bo_va->base.bo, vram_mem,
gtt_mem, cpu_mem);
}
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
@@ -949,7 +949,7 @@ void amdgpu_vm_get_memory(struct amdgpu_vm *vm, uint64_t 
*vram_mem,
amdgpu_bo_get_memory(bo_va->base.bo, vram_mem,
gtt_mem, cpu_mem);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
 }
 /**
  * amdgpu_vm_bo_update - update all BO mappings in the vm page table
@@ -1290,12 +1290,12 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
return r;
}
 
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
while (!list_empty(>invalidated)) {
bo_va = list_first_entry(>invalidated, struct amdgpu_bo_va,
 base.vm_status);
resv = bo_va->base.bo->tbo.base.resv;
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
 
/* Try to reserve the BO to avoid clearing its ptes */
if (!amdgpu_vm_debug && dma_resv_trylock(resv))
@@ -1310,9 +1310,9 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
 
if (!clear)
dma_resv_unlock(resv);
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
 
return 0;
 }
@@ -1763,9 +1763,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
}
}
 
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
list_del(_va->base.vm_status);
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
 
list_for_each_entry_safe(mapping, next, _va->valids, list) {
list_del(>list);
@@ -2019,7 +2019,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
INIT_LIST_HEAD(>moved);
INIT_LIST_HEAD(>idle);
INIT_LIST_HEAD(>invalidated);
-   spin_lock_init(>invalidated_lock);
+   spin_lock_init(>status_lock);
INIT_LIST_HEAD(>freed);
INIT_LIST_HEAD(>done);
 
@@ -2584,7 +2584,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, 
struct seq_file *m)
id = 0;
 
seq_puts(m, "\tInvalidated BOs:\n");
-   spin_lock(>invalidated_lock);
+   spin_lock(>status_lock);
list_for_each_entry_safe(bo_va, tmp, >invalidated, base.vm_status) {
if (!bo_va->base.bo)
continue;
@@ -2599,7 +2599,7 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, 
struct seq_file *m)
continue;
total_done += amdgpu_bo_print_info(id++, bo_va->base.bo, m);
}
-   spin_unlock(>invalidated_lock);
+   spin_unlock(>status_lock);
total_done_objs = id;
 
seq_printf(m, "\tTotal idle size:%12lld\tobjs:\t%d\n", 
total_idle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 9ecb7f663e19..98895c8fef6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -268,7 +268,9 @@ struct amdgpu_vm {
 
/* regular invalidated BOs, but not yet updated in the PT */
struct list_head

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-16 Thread Christian König




Am 16.09.22 um 15:50 schrieb Andrey Grodzovsky:


On 2022-09-16 01:18, Christian König wrote:

Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:


On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while 
running a lot of containers submitting gfx jobs. We have advanced 
tdr mode and mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will 
not be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will 
be rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which 
allows more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, 
the signal of some job will be skipped which result in an 
infinite wait for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process 
instead of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence 
design, so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring 
*ring, struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, 
>flags);

+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a 
massive no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge 
hack , no ?


No, it's as equally bad. I don't think we can do either.

Christian.



And all those ugly hack are there because we reuse a dma_fence 
(hw_fence embedded into the job) and correct me if I am wrong

but I don't think dma_fence is ever supposed to be reused.


Exactly that, yes.

Christian.



So maybe like Victor suggested we should move close and flush irq 
before sched_stop - this in my opinion should solve the issue, but 
Victor - why then you still need the change in amdgpu_fence_process ? 
You will not have the overflow situation because by moving irq_disable 
before stop any job that signaled will be removed from the scheduler 
pending list anyway. Also not that this change reverts 'drm/amdgpu: 
sanitize fence numbers' and could reintroduce that bug.


Andrey






Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs 
may be skipped and leave the rcu pointer not cleared which makes 
an infinite wait for the slot on the next fence emitted.


This infinite wait 

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-16 Thread Andrey Grodzovsky



On 2022-09-16 01:18, Christian König wrote:

Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:


On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running 
a lot of containers submitting gfx jobs. We have advanced tdr mode 
and mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not 
be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will 
be rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which 
allows more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, 
the signal of some job will be skipped which result in an infinite 
wait for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead 
of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence design, 
so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring 
*ring, struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, 
>flags);

+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a 
massive no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge 
hack , no ?


No, it's as equally bad. I don't think we can do either.

Christian.



And all those ugly hack are there because we reuse a dma_fence (hw_fence 
embedded into the job) and correct me if I am wrong

but I don't think dma_fence is ever supposed to be reused.

So maybe like Victor suggested we should move close and flush irq before 
sched_stop - this in my opinion should solve the issue, but Victor - why 
then you still need the change in amdgpu_fence_process ? You will not 
have the overflow situation because by moving irq_disable before stop 
any job that signaled will be removed from the scheduler pending list 
anyway. Also not that this change reverts 'drm/amdgpu: sanitize fence 
numbers' and could reintroduce that bug.


Andrey






Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may 
be skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step 

Re: [PATCH] drm/amdgpu: add MES and MES-KIQ version in debugfs

2022-09-16 Thread Deucher, Alexander
[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Yifan Zhang 

Sent: Friday, September 16, 2022 12:38 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Huang, Tim 
; Du, Xiaojian ; Zhang, Yifan 

Subject: [PATCH] drm/amdgpu: add MES and MES-KIQ version in debugfs

This patch addes MES and MES-KIQ version in debugfs.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 24 
 include/uapi/drm/amdgpu_drm.h   |  4 
 2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 1369c25448dc..bb0ed358909f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -328,6 +328,14 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
 fw_info->ver = adev->psp.cap_fw_version;
 fw_info->feature = adev->psp.cap_feature_version;
 break;
+   case AMDGPU_INFO_FW_MES_KIQ:
+   fw_info->ver = adev->mes.ucode_fw_version[0];
+   fw_info->feature = 0;
+   break;
+   case AMDGPU_INFO_FW_MES:
+   fw_info->ver = adev->mes.ucode_fw_version[1];
+   fw_info->feature = 0;
+   break;
 default:
 return -EINVAL;
 }
@@ -1581,6 +1589,22 @@ static int amdgpu_debugfs_firmware_info_show(struct 
seq_file *m, void *unused)
 fw_info.feature, fw_info.ver);
 }

+   /* MES_KIQ */
+   query_fw.fw_type = AMDGPU_INFO_FW_MES_KIQ;
+   ret = amdgpu_firmware_info(_info, _fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "MES_KIQ feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
+   /* MES */
+   query_fw.fw_type = AMDGPU_INFO_FW_MES;
+   ret = amdgpu_firmware_info(_info, _fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "MES feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
 seq_printf(m, "VBIOS version: %s\n", ctx->vbios_version);

 return 0;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..12fdf62730b8 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -755,6 +755,10 @@ struct drm_amdgpu_cs_chunk_data {
 #define AMDGPU_INFO_FW_TOC  0x15
 /* Subquery id: Query CAP firmware version */
 #define AMDGPU_INFO_FW_CAP  0x16
+   /* Subquery id: Query MES_KIQ firmware version */
+   #define AMDGPU_INFO_FW_MES_KIQ  0x17
+   /* Subquery id: Query MES firmware version */
+   #define AMDGPU_INFO_FW_MES  0x18

 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED 0x0f
--
2.37.3



Re: [PATCH V3 46/47] drm/amd/display: Fix failures of disabling primary plans

2022-09-16 Thread Michel Dänzer
On 2022-09-15 22:44, Rodrigo Siqueira Jordao wrote:
> 
> First of all, thanks a lot for your review. I want to take this opportunity 
> to discuss this topic in more depth and learn more from you and others.

That really should have happened before submitting this patch. It was luck that 
I happened to notice it fly by.


> On 2022-09-15 04:55, Michel Dänzer wrote:
>> On 2022-09-14 22:08, Alex Hung wrote:
>>> On 2022-09-14 10:55, Michel Dänzer wrote:
 On 2022-09-14 18:30, Alex Hung wrote:
> On 2022-09-14 07:40, Michel Dänzer wrote:
>> On 2022-09-14 15:31, Michel Dänzer wrote:
>>> On 2022-09-14 07:10, Wayne Lin wrote:
 From: Alex Hung 

 [Why & How]
 This fixes kernel errors when IGT disables primary planes during the
 tests kms_universal_plane::functional_test_pipe/pageflip_test_pipe.
>>>
>>> NAK.
>>>
>>> This essentially reverts commit b836a274b797 ("drm/amdgpu/dc: Require 
>>> primary plane to be enabled whenever the CRTC is") (except that it goes 
>>> even further and completely removes the requirement for any HW plane to 
>>> be enabled when the HW cursor is), so it would reintroduce the issues 
>>> described in that commit log.
>>
>> Actually not exactly the same issues, due to going even further than 
>> reverting my fix.
>>
>> Instead, the driver will claim that an atomic commit which enables the 
>> CRTC and the cursor plane, while leaving all other KMS planes disabled, 
>> succeeds. But the HW cursor will not actually be visible.
>
> I did not observe problems with cursors (GNOME 42.4 - desktop and 
> youtube/mpv video playback: windowed/fullscreen). Are there steps to 
> reproduce cursor problems?

 As described in my last follow-up e-mail: An atomic KMS commit which 
 enables the CRTC and the cursor plane, but disables all other KMS planes 
 for the CRTC. The commit will succeed, but will not result in the HW 
 cursor being actually visible. (I don't know of any specific application 
 or test which exercises this)
>>>
>>> Did you mean cursor plane depends on primary plane (i.e. no primary plane = 
>>> no visible HW cursor)?
>>
>> Sort of. I understand the HW cursor isn't an actual separate plane in AMD 
>> HW. Instead, the HW cursor can be displayed as part of any other HW plane. 
>> This means that the HW cursor can only be visible if any other plane is 
>> enabled.
> 
> The commit that you mentioned (b836a274b797) was created to address some 
> issues reported by the user. Please, correct me if I'm wrong, but the 
> original issue could be reproduced by following these steps on Gnome with 
> Ellesmere:
> 
> 1. Lock the screen and wait for suspending;
> 2. Wake up the system a few minutes later;
> 3. See two cursors, one that can be used and another one that is not working.

The primary symptom (which affected myself as a user as well, that was the 
motivation for addressing it) was that mutter fell back to SW cursor, because 
the legacy KMS cursor ioctl returned an error. There were corresponding 
messages from mutter, visible e.g. in the journal.

That the HW cursor remained visible for some users was a secondary symptom, 
probably due to a separate bug. I don't remember ever hitting that myself.

Note that current versions of mutter use atomic KMS and probably don't hit this 
issue anymore by default. You'd need to set the environment variable 
MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0 or use an older version of mutter.

> It is not evident to me why we cannot reproduce this problem. Do you have 
> some suggestions? If we find a case showing this bug, we can add it as part 
> of our tests.

Something like this should work:

1. Call drmModeRmFB for the FB currently assigned to the primary plane. This 
implicitly disables the primary plane.
2. Call drmModeSetCursor(2) with a non-0 BO handle and position & dimensions 
such that the cursor would be visible on the CRTC.

Before b836a274b797, this resulted in drmModeSetCursor(2) returning EINVAL due 
to

if (state->enable && state->active &&
does_crtc_have_active_cursor(state) &&
dm_crtc_state->active_planes == 0)
return -EINVAL;


(The b836a274b797 commit log describes a similar error when trying to flip the 
legacy DPMS property from off to on. My memory is hazy on this, but I guess 
that would have prevented the monitor from turning back on after DPMS off if 
the primary/cursor planes happened to be disable/enabled respectively when 
trying to toggle the property)


>>> If there is no primary plane, what scenario would it be required to draw a 
>>> cursor?
>>>
>>> If this is a rare case, would it still be a concern?
>>
>> Yes. In the KMS API, the cursor plane is like any other plane. A Wayland 
>> compositor or other user space may legitimately try to display something 
>> (not necessarily a "cursor") using only the cursor plane. The driver must 
>> 

[PATCH 8/8] drm/amdgpu: add gang submit frontend v5

2022-09-16 Thread Christian König
Allows submitting jobs as gang which needs to run on multiple engines at the
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no
joining of work at the end.

v2: rebase and fix review comments from Andrey and Yogesh
v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang
leader only when necessary
v4: fix order of pushing jobs and adding fences found by Trigger.
v5: fix job index calculation and adding IBs to jobs

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 269 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  |  12 +-
 5 files changed, 195 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 4f5bd96000ec..c7b1a2dfde13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -686,6 +686,7 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
ib->length_dw = ib_len;
/* This works for NO_HWS. TODO: need to handle without knowing VMID */
job->vmid = vmid;
+   job->num_ibs = 1;
 
ret = amdgpu_ib_schedule(ring, 1, ib, job, );
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 58088c663125..964052377991 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -64,11 +64,11 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
return 0;
 }
 
-static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
-  struct drm_amdgpu_cs_chunk_ib *chunk_ib,
-  unsigned int *num_ibs)
+static int amdgpu_cs_job_idx(struct amdgpu_cs_parser *p,
+struct drm_amdgpu_cs_chunk_ib *chunk_ib)
 {
struct drm_sched_entity *entity;
+   unsigned int i;
int r;
 
r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
@@ -77,17 +77,38 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
if (r)
return r;
 
-   /* Abort if there is no run queue associated with this entity.
-* Possibly because of disabled HW IP*/
+   /*
+* Abort if there is no run queue associated with this entity.
+* Possibly because of disabled HW IP.
+*/
if (entity->rq == NULL)
return -EINVAL;
 
-   /* Currently we don't support submitting to multiple entities */
-   if (p->entity && p->entity != entity)
+   /* Check if we can add this IB to some existing job */
+   for (i = 0; i < p->gang_size; ++i)
+   if (p->entities[i] == entity)
+   return i;
+
+   /* If not increase the gang size if possible */
+   if (i == AMDGPU_CS_GANG_SIZE)
return -EINVAL;
 
-   p->entity = entity;
-   ++(*num_ibs);
+   p->entities[i] = entity;
+   p->gang_size = i + 1;
+   return i;
+}
+
+static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
+  struct drm_amdgpu_cs_chunk_ib *chunk_ib,
+  unsigned int *num_ibs)
+{
+   int r;
+
+   r = amdgpu_cs_job_idx(p, chunk_ib);
+   if (r < 0)
+   return r;
+
+   ++(num_ibs[r]);
return 0;
 }
 
@@ -161,11 +182,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
   union drm_amdgpu_cs *cs)
 {
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { };
struct amdgpu_vm *vm = >vm;
uint64_t *chunk_array_user;
uint64_t *chunk_array;
-   unsigned size, num_ibs = 0;
uint32_t uf_offset = 0;
+   unsigned int size;
int ret;
int i;
 
@@ -228,7 +250,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
if (size < sizeof(struct drm_amdgpu_cs_chunk_ib))
goto free_partial_kdata;
 
-   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs);
+   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs);
if (ret)
goto free_partial_kdata;
break;
@@ -265,21 +287,28 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
}
}
 
-   ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm);
-   if (ret)
-   goto free_all_kdata;
+   if (!p->gang_size)
+ 

[PATCH 4/8] drm/amdgpu: move entity selection and job init earlier during CS

2022-09-16 Thread Christian König
Initialize the entity for the CS and scheduler job much earlier.

v2: fix job initialisation order and use correct scheduler instance

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 54 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  5 +++
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 265ed2118a80..58088c663125 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -68,6 +68,25 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
   struct drm_amdgpu_cs_chunk_ib *chunk_ib,
   unsigned int *num_ibs)
 {
+   struct drm_sched_entity *entity;
+   int r;
+
+   r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
+ chunk_ib->ip_instance,
+ chunk_ib->ring, );
+   if (r)
+   return r;
+
+   /* Abort if there is no run queue associated with this entity.
+* Possibly because of disabled HW IP*/
+   if (entity->rq == NULL)
+   return -EINVAL;
+
+   /* Currently we don't support submitting to multiple entities */
+   if (p->entity && p->entity != entity)
+   return -EINVAL;
+
+   p->entity = entity;
++(*num_ibs);
return 0;
 }
@@ -250,6 +269,10 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
if (ret)
goto free_all_kdata;
 
+   ret = drm_sched_job_init(>job->base, p->entity, >vm);
+   if (ret)
+   goto free_all_kdata;
+
if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
ret = -ECANCELED;
goto free_all_kdata;
@@ -286,32 +309,11 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
 {
struct drm_amdgpu_cs_chunk_ib *chunk_ib = chunk->kdata;
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   struct amdgpu_ring *ring = amdgpu_job_ring(p->job);
struct amdgpu_ib *ib = >job->ibs[*num_ibs];
struct amdgpu_vm *vm = >vm;
-   struct drm_sched_entity *entity;
-   struct amdgpu_ring *ring;
int r;
 
-   r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
- chunk_ib->ip_instance,
- chunk_ib->ring, );
-   if (r)
-   return r;
-
-   /*
-* Abort if there is no run queue associated with this entity.
-* Possibly because of disabled HW IP.
-*/
-   if (entity->rq == NULL)
-   return -EINVAL;
-
-   /* Currently we don't support submitting to multiple entities */
-   if (p->entity && p->entity != entity)
-   return -EINVAL;
-
-   p->entity = entity;
-
-   ring = to_amdgpu_ring(entity->rq->sched);
/* MM engine doesn't support user fences */
if (p->uf_entry.tv.bo && ring->funcs->no_user_fence)
return -EINVAL;
@@ -978,8 +980,8 @@ static void trace_amdgpu_cs_ibs(struct amdgpu_cs_parser 
*parser)
 
 static int amdgpu_cs_patch_ibs(struct amdgpu_cs_parser *p)
 {
-   struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched);
struct amdgpu_job *job = p->job;
+   struct amdgpu_ring *ring = amdgpu_job_ring(job);
unsigned int i;
int r;
 
@@ -1171,10 +1173,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
job = p->job;
p->job = NULL;
 
-   r = drm_sched_job_init(>base, p->entity, >vm);
-   if (r)
-   goto error_unlock;
-
drm_sched_job_arm(>base);
 
/* No memory allocation is allowed while holding the notifier lock.
@@ -1231,8 +1229,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 error_abort:
drm_sched_job_cleanup(>base);
mutex_unlock(>adev->notifier_lock);
-
-error_unlock:
amdgpu_job_free(job);
return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2a1961bf1194..866d35bbe073 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -72,6 +72,11 @@ struct amdgpu_job {
struct amdgpu_ibibs[];
 };
 
+static inline struct amdgpu_ring *amdgpu_job_ring(struct amdgpu_job *job)
+{
+   return to_amdgpu_ring(job->base.entity->rq->sched);
+}
+
 int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
 struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
-- 
2.25.1



[PATCH 7/8] drm/amdgpu: add gang submit backend v2

2022-09-16 Thread Christian König
Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

v2: fix logic inversion, improve documentation, fix rcu

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 28 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  3 ++
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 79bb6fd83094..ae9371b172e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -885,6 +885,7 @@ struct amdgpu_device {
u64 fence_context;
unsignednum_rings;
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
+   struct dma_fence __rcu  *gang_submit;
boolib_pool_ready;
struct amdgpu_sa_managerib_pools[AMDGPU_IB_POOL_MAX];
struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
@@ -1294,6 +1295,8 @@ u32 amdgpu_device_pcie_port_rreg(struct amdgpu_device 
*adev,
u32 reg);
 void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev,
u32 reg, u32 v);
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang);
 
 /* atpx handler */
 #if defined(CONFIG_VGA_SWITCHEROO)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..b6f6445e50d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3503,6 +3503,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->gmc.gart_size = 512 * 1024 * 1024;
adev->accel_working = false;
adev->num_rings = 0;
+   RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub());
adev->mman.buffer_funcs = NULL;
adev->mman.buffer_funcs_ring = NULL;
adev->vm_manager.vm_pte_funcs = NULL;
@@ -3984,6 +3985,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
release_firmware(adev->firmware.gpu_info_fw);
adev->firmware.gpu_info_fw = NULL;
adev->accel_working = false;
+   dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));
 
amdgpu_reset_fini(adev);
 
@@ -5919,3 +5921,36 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device 
*adev,
(void)RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
 }
+
+/**
+ * amdgpu_device_switch_gang - switch to a new gang
+ * @adev: amdgpu_device pointer
+ * @gang: the gang to switch to
+ *
+ * Try to switch to a new gang.
+ * Returns: NULL if we switched to the new gang or a reference to the current
+ * gang leader.
+ */
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang)
+{
+   struct dma_fence *old = NULL;
+
+   do {
+   dma_fence_put(old);
+   rcu_read_lock();
+   old = dma_fence_get_rcu_safe(>gang_submit);
+   rcu_read_unlock();
+
+   if (old == gang)
+   break;
+
+   if (!dma_fence_is_signaled(old))
+   return old;
+
+   } while (cmpxchg((struct dma_fence __force **)>gang_submit,
+old, gang) != old);
+
+   dma_fence_put(old);
+   return NULL;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 877b3c22a8a8..cfbe19cfe9af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -173,11 +173,29 @@ static void amdgpu_job_free_cb(struct drm_sched_job 
*s_job)
dma_fence_put(>hw_fence);
 }
 
+void amdgpu_job_set_gang_leader(struct amdgpu_job *job,
+   struct amdgpu_job *leader)
+{
+   struct dma_fence *fence = >base.s_fence->scheduled;
+
+   WARN_ON(job->gang_submit);
+
+   /*
+* Don't add a reference when we are the gang leader to avoid circle
+* dependency.
+*/
+   if (job != leader)
+   dma_fence_get(fence);
+   job->gang_submit = fence;
+}
+
 void amdgpu_job_free(struct amdgpu_job *job)
 {
amdgpu_job_free_resources(job);
amdgpu_sync_free(>sync);
amdgpu_sync_free(>sched_sync);
+   if (job->gang_submit != >base.s_fence->scheduled)
+   

[PATCH 3/8] drm/amdgpu: fix user fence CS check

2022-09-16 Thread Christian König
An offset of zero is valid, check if the BO is present or not.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5c37dde97ff2..265ed2118a80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -313,7 +313,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
 
ring = to_amdgpu_ring(entity->rq->sched);
/* MM engine doesn't support user fences */
-   if (p->job->uf_addr && ring->funcs->no_user_fence)
+   if (p->uf_entry.tv.bo && ring->funcs->no_user_fence)
return -EINVAL;
 
if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
-- 
2.25.1



[PATCH 1/8] drm/amdgpu: cleanup CS pass2 v5

2022-09-16 Thread Christian König
Cleanup the coding style and function names to represent the data
they process for pass2 as well.

Go over the chunks only twice now instead of multiple times.

v2: fix job initialisation order and use correct scheduler instance
v3: try to move all functional changes into a separate patch.
v4: separate reordering, pass1 and pass2 change
v5: fix va_start calculation

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 352 -
 1 file changed, 168 insertions(+), 184 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index e104d7ef3c3d..c04073f4ead5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -278,93 +278,84 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
return ret;
 }
 
-static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
-struct amdgpu_cs_parser *parser)
+static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
+  struct amdgpu_cs_chunk *chunk,
+  unsigned int *num_ibs,
+  unsigned int *ce_preempt,
+  unsigned int *de_preempt)
 {
-   struct amdgpu_fpriv *fpriv = parser->filp->driver_priv;
+   struct drm_amdgpu_cs_chunk_ib *chunk_ib = chunk->kdata;
+   struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   struct amdgpu_ib *ib = >job->ibs[*num_ibs];
struct amdgpu_vm *vm = >vm;
-   int r, ce_preempt = 0, de_preempt = 0;
+   struct drm_sched_entity *entity;
struct amdgpu_ring *ring;
-   int i, j;
-
-   for (i = 0, j = 0; i < parser->nchunks && j < parser->job->num_ibs; 
i++) {
-   struct amdgpu_cs_chunk *chunk;
-   struct amdgpu_ib *ib;
-   struct drm_amdgpu_cs_chunk_ib *chunk_ib;
-   struct drm_sched_entity *entity;
-
-   chunk = >chunks[i];
-   ib = >job->ibs[j];
-   chunk_ib = (struct drm_amdgpu_cs_chunk_ib *)chunk->kdata;
+   int r;
 
-   if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
-   continue;
+   r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
+ chunk_ib->ip_instance,
+ chunk_ib->ring, );
+   if (r)
+   return r;
 
-   if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
-   chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
-   if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
-   ce_preempt++;
-   else
-   de_preempt++;
+   /*
+* Abort if there is no run queue associated with this entity.
+* Possibly because of disabled HW IP.
+*/
+   if (entity->rq == NULL)
+   return -EINVAL;
 
-   /* each GFX command submit allows 0 or 1 IB preemptible 
for CE & DE */
-   if (ce_preempt > 1 || de_preempt > 1)
-   return -EINVAL;
-   }
+   /* Currently we don't support submitting to multiple entities */
+   if (p->entity && p->entity != entity)
+   return -EINVAL;
 
-   r = amdgpu_ctx_get_entity(parser->ctx, chunk_ib->ip_type,
- chunk_ib->ip_instance, chunk_ib->ring,
- );
-   if (r)
-   return r;
+   p->entity = entity;
 
-   if (chunk_ib->flags & AMDGPU_IB_FLAG_PREAMBLE)
-   parser->job->preamble_status |=
-   AMDGPU_PREAMBLE_IB_PRESENT;
+   ring = to_amdgpu_ring(entity->rq->sched);
+   /* MM engine doesn't support user fences */
+   if (p->job->uf_addr && ring->funcs->no_user_fence)
+   return -EINVAL;
 
-   if (parser->entity && parser->entity != entity)
-   return -EINVAL;
+   if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
+   chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
+   if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
+   (*ce_preempt)++;
+   else
+   (*de_preempt)++;
 
-   /* Return if there is no run queue associated with this entity.
-* Possibly because of disabled HW IP*/
-   if (entity->rq == NULL)
+   /* Each GFX command submit allows only 1 IB max
+* preemptible for CE & DE */
+   if (*ce_preempt > 1 || *de_preempt > 1)
return -EINVAL;
+   }
 
-   parser->entity = entity;
-
-   ring = to_amdgpu_ring(entity->rq->sched);
-   r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
-  chunk_ib->ib_bytes : 0,
-   

[PATCH 6/8] drm/amdgpu: cleanup instance limit on VCN4 v3

2022-09-16 Thread Christian König
Similar to what we did for VCN3 use the job instead of the parser
entity. Cleanup the coding style quite a bit as well.

v2: merge improved application check into this patch
v3: finally fix the check

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 46 +++
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 09c89faa8c27..bdabe5da8d75 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1591,21 +1591,23 @@ static void vcn_v4_0_unified_ring_set_wptr(struct 
amdgpu_ring *ring)
}
 }
 
-static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p)
+static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p,
+   struct amdgpu_job *job)
 {
struct drm_gpu_scheduler **scheds;
 
/* The create msg must be in the first IB submitted */
-   if (atomic_read(>entity->fence_seq))
+   if (atomic_read(>base.entity->fence_seq))
return -EINVAL;
 
-   scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC]
-   [AMDGPU_RING_PRIO_0].sched;
-   drm_sched_entity_modify_sched(p->entity, scheds, 1);
+   scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC]
+   [AMDGPU_RING_PRIO_DEFAULT].sched;
+   drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
return 0;
 }
 
-static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr)
+static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
+   uint64_t addr)
 {
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_bo_va_mapping *map;
@@ -1676,7 +1678,7 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11)
continue;
 
-   r = vcn_v4_0_limit_sched(p);
+   r = vcn_v4_0_limit_sched(p, job);
if (r)
goto out;
}
@@ -1689,32 +1691,34 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
 #define RADEON_VCN_ENGINE_TYPE_DECODE 
(0x0003)
 
 static int vcn_v4_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
-   struct amdgpu_job *job,
-   struct amdgpu_ib *ib)
+  struct amdgpu_job *job,
+  struct amdgpu_ib *ib)
 {
-   struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched);
-   struct amdgpu_vcn_decode_buffer *decode_buffer = NULL;
+   struct amdgpu_ring *ring = amdgpu_job_ring(job);
+   struct amdgpu_vcn_decode_buffer *decode_buffer;
+   uint64_t addr;
uint32_t val;
-   int r = 0;
 
/* The first instance can decode anything */
if (!ring->me)
-   return r;
+   return 0;
 
/* unified queue ib header has 8 double words. */
if (ib->length_dw < 8)
-   return r;
+   return 0;
 
val = amdgpu_ib_get_value(ib, 6); //RADEON_VCN_ENGINE_TYPE
+   if (val != RADEON_VCN_ENGINE_TYPE_DECODE)
+   return 0;
 
-   if (val == RADEON_VCN_ENGINE_TYPE_DECODE) {
-   decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10];
+   decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10];
 
-   if (decode_buffer->valid_buf_flag  & 0x1)
-   r = vcn_v4_0_dec_msg(p, 
((u64)decode_buffer->msg_buffer_address_hi) << 32 |
-   
decode_buffer->msg_buffer_address_lo);
-   }
-   return r;
+   if (!(decode_buffer->valid_buf_flag  & 0x1))
+   return 0;
+
+   addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 |
+   decode_buffer->msg_buffer_address_lo;
+   return vcn_v4_0_dec_msg(p, job, addr);
 }
 
 static const struct amdgpu_ring_funcs vcn_v4_0_unified_ring_vm_funcs = {
-- 
2.25.1



[PATCH 5/8] drm/amdgpu: revert "fix limiting AV1 to the first instance on VCN3" v3

2022-09-16 Thread Christian König
This reverts commit 250195ff744f260c169f5427422b6f39c58cb883.

The job should now be initialized when we reach the parser functions.

v2: merge improved application check into this patch
v3: back to the original test, but use the right ring

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 39405f0db824..9c8b5fd99037 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1761,21 +1761,23 @@ static const struct amdgpu_ring_funcs 
vcn_v3_0_dec_sw_ring_vm_funcs = {
.emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
 };
 
-static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p)
+static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p,
+   struct amdgpu_job *job)
 {
struct drm_gpu_scheduler **scheds;
 
/* The create msg must be in the first IB submitted */
-   if (atomic_read(>entity->fence_seq))
+   if (atomic_read(>base.entity->fence_seq))
return -EINVAL;
 
scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC]
[AMDGPU_RING_PRIO_DEFAULT].sched;
-   drm_sched_entity_modify_sched(p->entity, scheds, 1);
+   drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
return 0;
 }
 
-static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr)
+static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
+   uint64_t addr)
 {
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_bo_va_mapping *map;
@@ -1846,7 +1848,7 @@ static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11)
continue;
 
-   r = vcn_v3_0_limit_sched(p);
+   r = vcn_v3_0_limit_sched(p, job);
if (r)
goto out;
}
@@ -1860,7 +1862,7 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct 
amdgpu_cs_parser *p,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib)
 {
-   struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched);
+   struct amdgpu_ring *ring = amdgpu_job_ring(job);
uint32_t msg_lo = 0, msg_hi = 0;
unsigned i;
int r;
@@ -1879,7 +1881,8 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct 
amdgpu_cs_parser *p,
msg_hi = val;
} else if (reg == PACKET0(p->adev->vcn.internal.cmd, 0) &&
   val == 0) {
-   r = vcn_v3_0_dec_msg(p, ((u64)msg_hi) << 32 | msg_lo);
+   r = vcn_v3_0_dec_msg(p, job,
+((u64)msg_hi) << 32 | msg_lo);
if (r)
return r;
}
-- 
2.25.1



[PATCH 2/8] drm/amdgpu: cleanup error handling in amdgpu_cs_parser_bos

2022-09-16 Thread Christian König
Return early on success and so remove all those "if (r)" in the error
path.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 37 +-
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c04073f4ead5..5c37dde97ff2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -933,35 +933,34 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
*p,
if (r)
goto error_validate;
 
-   amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
-p->bytes_moved_vis);
-
-   amdgpu_job_set_resources(p->job, p->bo_list->gds_obj,
-p->bo_list->gws_obj, p->bo_list->oa_obj);
-
-   if (!r && p->uf_entry.tv.bo) {
+   if (p->uf_entry.tv.bo) {
struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo);
 
r = amdgpu_ttm_alloc_gart(>tbo);
+   if (r)
+   goto error_validate;
+
p->job->uf_addr += amdgpu_bo_gpu_offset(uf);
}
 
+   amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
+p->bytes_moved_vis);
+   amdgpu_job_set_resources(p->job, p->bo_list->gds_obj,
+p->bo_list->gws_obj, p->bo_list->oa_obj);
+   return 0;
+
 error_validate:
-   if (r)
-   ttm_eu_backoff_reservation(>ticket, >validated);
+   ttm_eu_backoff_reservation(>ticket, >validated);
 
 out_free_user_pages:
-   if (r) {
-   amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+   amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 
-   if (!e->user_pages)
-   continue;
-   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
-   kvfree(e->user_pages);
-   e->user_pages = NULL;
-   }
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (!e->user_pages)
+   continue;
+   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
+   kvfree(e->user_pages);
+   e->user_pages = NULL;
}
return r;
 }
-- 
2.25.1



Final gang submit patches

2022-09-16 Thread Christian König
Hey guys,

so thanks to Ruijing I was finally able to hammer out all the known VCN
regressions from this patch set.

Alex can you review those? AFAIK you are pretty much the only other
person deep enough in the CS IOCTL for that.

Going to takle the userptr issue Bas stumbled over next.

Thanks,
Christian.




RE: [PATCH 0/4] support query rlcv/rlcp firmware version

2022-09-16 Thread Xu, Feifei
[AMD Official Use Only - General]



Series is Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Hawking Zhang
Sent: Friday, September 16, 2022 1:00 AM
To: amd-gfx@lists.freedesktop.org; Gao, Likun ; Deucher, 
Alexander 
Cc: Zhang, Hawking 
Subject: [PATCH 0/4] support query rlcv/rlcp firmware version

To allow query rlcv/rlcp firmware verion info

Hawking Zhang (4):
  drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx
  drm/amdgpu: support print rlc v2_x ucode hdr
  drm/amdgpu: add two new subquery ids
  drm/amdgpu: add rlcv/rlcp version info to debugfs

 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  24 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 168 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h |   4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   5 +
 include/uapi/drm/amdgpu_drm.h |   4 +
 6 files changed, 159 insertions(+), 50 deletions(-)

-- 
2.17.1