Re: [PATCH] drm/gem: Expose the buffer object handle to userspace last

2023-02-23 Thread Christian König

Am 20.02.23 um 11:23 schrieb Tvrtko Ursulin:


On 20/02/2023 10:01, Christian König wrote:

Am 20.02.23 um 10:55 schrieb Tvrtko Ursulin:


Hi,

On 14/02/2023 13:59, Christian König wrote:

Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Currently drm_gem_handle_create_tail exposes the handle to userspace
before the buffer object constructions is complete. This allowing
of working against a partially constructed object, which may also 
be in

the process of having its creation fail, can have a range of negative
outcomes.

A lot of those will depend on what the individual drivers are 
doing in
their obj->funcs->open() callbacks, and also with a common failure 
mode

being -ENOMEM from drm_vma_node_allow.

We can make sure none of this can happen by allocating a handle last,
although with a downside that more of the function now runs under the
dev->object_name_lock.

Looking into the individual drivers open() hooks, we have
amdgpu_gem_object_open which seems like it could have a potential 
security

issue without this change.

A couple drivers like qxl_gem_object_open and vmw_gem_object_open
implement no-op hooks so no impact for them.

A bunch of other require a deeper look by individual owners to 
asses for

impact. Those are lima_gem_object_open, nouveau_gem_object_open,
panfrost_gem_open, radeon_gem_object_open and 
virtio_gpu_gem_object_open.


Putting aside the risk assesment of the above, some common 
scenarios to

think about are along these lines:

1)
Userspace closes a handle by speculatively "guessing" it from a 
second

thread.

This results in an unreachable buffer object so, a memory leak.

2)
Same as 1), but object is in the process of getting closed (failed
creation).

The second thread is then able to re-cycle the handle and 
idr_remove would
in the first thread would then remove the handle it does not own 
from the

idr.

3)
Going back to the earlier per driver problem space - individual 
impact
assesment of allowing a second thread to access and operate on a 
partially

constructed handle / object. (Can something crash? Leak information?)

In terms of identifying when the problem started I will tag some 
patches

as references, but not all, if even any, of them actually point to a
broken state. I am just identifying points at which more 
opportunity for

issues to arise was added.


Yes I've looked into this once as well, but couldn't completely 
solve it for some reason.


Give me a day or two to get this tested and all the logic swapped 
back into my head again.


Managed to recollect what the problem with earlier attempts was?


Nope, that's way to long ago. I can only assume that I ran into 
problems with the object_name_lock.


Probably best to double check if that doesn't result in a lock 
inversion when somebody grabs the reservation lock in their ->load() 
callback.


Hmm I don't immediately follow the connection. But I have only found 
radeon_driver_load_kms as using the load callback. Is there any 
lockdep enabled CI for that driver which could tell us if there is a 
problem there?


We don't have CI for radeon and most likely never will, that hw is just 
to old. But we also have amdgpu_gem_object_open() and that looks 
suspiciously like trouble.


The function makes sure that every BO is registered in the VM house 
keeping functions of the drm_file and while doing so grabs a few locks. 
I'm not sure what the locking order of those are.


Could be that this will work, could be that it breaks. I will ping 
internally today if somebody from my team can take a look at this patch.


Regards,
Christian.



Regards,

Tvrtko



Regards,
Christian.



Regards,

Tvrtko


Christian.



References: 304eda32920b ("drm/gem: add hooks to notify driver 
when object handle is created/destroyed")

References: ca481c9b2a3a ("drm/gem: implement vma access management")
References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
Cc: dri-de...@lists.freedesktop.org
Cc: Rob Clark 
Cc: Ben Skeggs 
Cc: David Herrmann 
Cc: Noralf Trønnes 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: amd-gfx@lists.freedesktop.org
Cc: l...@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Cc: Steven Price 
Cc: virtualizat...@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Cc: Zack Rusin 
---
  drivers/gpu/drm/drm_gem.c | 48 
+++

  1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index aa15c52ae182..e3d897bca0f2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file 
*file_priv,

 u32 *handlep)
  {
  struct drm_device *dev = obj->dev;
-    u32 handle;
  int ret;
WARN_ON(!mutex_is_locked(&dev->object_name_lock));
  if (obj->handle_count++ == 0)
  drm_gem_object_get(obj);
+    ret = drm_vma_node_allow(&obj->vma_node, file_priv);
+    if (ret)
+    goto err_put;

Re: Subject [PATCH] drm/radeon: Fix eDP for single-display iMac11,2

2023-02-23 Thread Alex Deucher
I've applied this manually.  Please use git to generate and email
patches in the future.

Thanks!

Alex

On Sun, Feb 19, 2023 at 12:02 AM Mark Hawrylak  wrote:
>
> From Mark Hawrylak 
>
> Apple iMac11,2 (mid 2010) also with Radeon HD-4670 that has the same
> issue as iMac10,1 (late 2009) where the internal eDP panel stays dark on
> driver load.  This patch treats iMac11,2 the same as iMac10,1,
> so the eDP panel stays active.
>
> Additional steps:
> Kernel boot parameter radeon.nomodeset=0 required to keep the eDP
> panel active.
>
> This patch is an extension of the commit 
> 564d8a2cf3abf16575af48bdc3e86e92ee8a617d
> Subject: [PATCH 3.16 100/192] drm/radeon: Fix eDP for single-display iMac10,1 
> (v2)
> Date: Mon, 09 Oct 2017 13:44:24 +0100 [thread overview]
> https://lore.kernel.org/all/lsq.1507553064.833262...@decadent.org.uk/
>
> By making a contribution to this project, I certify that:
> The contribution was created in whole or in part by me and I have the 
> right to submit it under the open source license indicated in the file; or
> The contribution is based upon previous work that, to the best of my 
> knowledge, is covered under an appropriate open source license and I have the 
> right under that license to submit that work with modifications, whether 
> created in whole or in part by me, under the same open source license (unless 
> I am permitted to submit under a different license), as indicated in the 
> file; or
> The contribution was provided directly to me by some other person who 
> certified (a), (b) or (c) and I have not modified it.
> I understand and agree that this project and the contribution are 
> public and that a record of the contribution (including all personal 
> information I submit with it, including my sign-off) is maintained 
> indefinitely and may be redistributed consistent with this project or the 
> open source license(s) involved.
>
> Signed-off-by: Mark Hawrylak 
>
> ---
>
> --- linux/drivers/gpu/drm/radeon/atombios_encoders.c.orig   2023-02-19 
> 14:03:03.126499290 +1100
> +++ linux/drivers/gpu/drm/radeon/atombios_encoders.c2023-02-19 
> 14:04:15.449831506 +1100
> @@ -2122,11 +2122,11 @@ int radeon_atom_pick_dig_encoder(struct
>
> /*
>  * On DCE32 any encoder can drive any block so usually just use crtc 
> id,
> -* but Apple thinks different at least on iMac10,1, so there use 
> linkb,
> +* but Apple thinks different at least on iMac10,1 and iMac11,2, so 
> there use linkb,
>  * otherwise the internal eDP panel will stay dark.
>  */
> if (ASIC_IS_DCE32(rdev)) {
> -   if (dmi_match(DMI_PRODUCT_NAME, "iMac10,1"))
> +   if (dmi_match(DMI_PRODUCT_NAME, "iMac10,1") || 
> dmi_match(DMI_PRODUCT_NAME, "iMac11,2"))
> enc_idx = (dig->linkb) ? 1 : 0;
> else
> enc_idx = radeon_crtc->crtc_id;
>
>
> --
>
> Regards
> Mark Hawrylak
> 0425 714 725


Re: [PATCH v2 3/3] drm/amd/display: Remove unused local variables and function

2023-02-23 Thread Alex Deucher
Applied.  Thanks.

On Fri, Feb 17, 2023 at 1:15 PM Arthur Grillo  wrote:
>
> Remove a couple of local variables that are only set but never used,
> also remove an static utility function that is never used in consequence
> of the variable removal.
>
> This decrease the number of -Wunused-but-set-variable warnings.
>
> Signed-off-by: Arthur Grillo 
> ---
>  .../gpu/drm/amd/display/dc/dcn31/dcn31_apg.c  | 41 ---
>  1 file changed, 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_apg.c 
> b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_apg.c
> index 24e9ff65434d..05aac3e444b4 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_apg.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_apg.c
> @@ -72,40 +72,6 @@ static void apg31_disable(
> REG_UPDATE(APG_CONTROL2, APG_ENABLE, 0);
>  }
>
> -static union audio_cea_channels speakers_to_channels(
> -   struct audio_speaker_flags speaker_flags)
> -{
> -   union audio_cea_channels cea_channels = {0};
> -
> -   /* these are one to one */
> -   cea_channels.channels.FL = speaker_flags.FL_FR;
> -   cea_channels.channels.FR = speaker_flags.FL_FR;
> -   cea_channels.channels.LFE = speaker_flags.LFE;
> -   cea_channels.channels.FC = speaker_flags.FC;
> -
> -   /* if Rear Left and Right exist move RC speaker to channel 7
> -* otherwise to channel 5
> -*/
> -   if (speaker_flags.RL_RR) {
> -   cea_channels.channels.RL_RC = speaker_flags.RL_RR;
> -   cea_channels.channels.RR = speaker_flags.RL_RR;
> -   cea_channels.channels.RC_RLC_FLC = speaker_flags.RC;
> -   } else {
> -   cea_channels.channels.RL_RC = speaker_flags.RC;
> -   }
> -
> -   /* FRONT Left Right Center and REAR Left Right Center are exclusive */
> -   if (speaker_flags.FLC_FRC) {
> -   cea_channels.channels.RC_RLC_FLC = speaker_flags.FLC_FRC;
> -   cea_channels.channels.RRC_FRC = speaker_flags.FLC_FRC;
> -   } else {
> -   cea_channels.channels.RC_RLC_FLC = speaker_flags.RLC_RRC;
> -   cea_channels.channels.RRC_FRC = speaker_flags.RLC_RRC;
> -   }
> -
> -   return cea_channels;
> -}
> -
>  static void apg31_se_audio_setup(
> struct apg *apg,
> unsigned int az_inst,
> @@ -113,24 +79,17 @@ static void apg31_se_audio_setup(
>  {
> struct dcn31_apg *apg31 = DCN31_APG_FROM_APG(apg);
>
> -   uint32_t speakers = 0;
> -   uint32_t channels = 0;
> -
> ASSERT(audio_info);
> /* This should not happen.it does so we don't get BSOD*/
> if (audio_info == NULL)
> return;
>
> -   speakers = audio_info->flags.info.ALLSPEAKERS;
> -   channels = speakers_to_channels(audio_info->flags.speaker_flags).all;
> -
> /* DisplayPort only allows for one audio stream with stream ID 0 */
> REG_UPDATE(APG_CONTROL2, APG_DP_AUDIO_STREAM_ID, 0);
>
> /* When running in "pair mode", pairs of audio channels have their 
> own enable
>  * this is for really old audio drivers */
> REG_UPDATE(APG_DBG_GEN_CONTROL, APG_DBG_AUDIO_CHANNEL_ENABLE, 0xFF);
> -   // REG_UPDATE(APG_DBG_GEN_CONTROL, APG_DBG_AUDIO_CHANNEL_ENABLE, 
> channels);
>
> /* Disable forced mem power off */
> REG_UPDATE(APG_MEM_PWR, APG_MEM_PWR_FORCE, 0);
> --
> 2.39.2
>


Re: [PATCH v2 2/3] drm/amd/display: Remove unused local variables

2023-02-23 Thread Alex Deucher
Applied.  Thanks.

On Fri, Feb 17, 2023 at 1:15 PM Arthur Grillo  wrote:
>
> Remove local variables that were just set but were never used. This
> decrease the number of -Wunused-but-set-variable warnings.
>
> Signed-off-by: Arthur Grillo 
> ---
>  drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c  | 3 +--
>  drivers/gpu/drm/amd/display/dc/dcn201/dcn201_dpp.c | 7 ---
>  drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c   | 2 --
>  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_afmt.c  | 2 --
>  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hubp.c  | 4 
>  drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 3 ---
>  drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c  | 5 +
>  .../gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c  | 4 
>  .../drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c  | 2 --
>  .../drm/amd/display/dc/link/protocols/link_dp_capability.c | 4 
>  10 files changed, 2 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c 
> b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
> index c4287147b853..ee08b545aaea 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_link_encoder.c
> @@ -1219,7 +1219,6 @@ void 
> dcn10_link_encoder_update_mst_stream_allocation_table(
> const struct link_mst_stream_allocation_table *table)
>  {
> struct dcn10_link_encoder *enc10 = TO_DCN10_LINK_ENC(enc);
> -   uint32_t value0 = 0;
> uint32_t value1 = 0;
> uint32_t value2 = 0;
> uint32_t slots = 0;
> @@ -1321,7 +1320,7 @@ void 
> dcn10_link_encoder_update_mst_stream_allocation_table(
> do {
> udelay(10);
>
> -   value0 = REG_READ(DP_MSE_SAT_UPDATE);
> +   REG_READ(DP_MSE_SAT_UPDATE);
>
> REG_GET(DP_MSE_SAT_UPDATE,
> DP_MSE_SAT_UPDATE, &value1);
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_dpp.c 
> b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_dpp.c
> index f50ab961bc17..a7268027a472 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_dpp.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_dpp.c
> @@ -185,13 +185,6 @@ static bool dpp201_get_optimal_number_of_taps(
> struct scaler_data *scl_data,
> const struct scaling_taps *in_taps)
>  {
> -   uint32_t pixel_width;
> -
> -   if (scl_data->viewport.width > scl_data->recout.width)
> -   pixel_width = scl_data->recout.width;
> -   else
> -   pixel_width = scl_data->viewport.width;
> -
> if (scl_data->viewport.width  != scl_data->h_active &&
> scl_data->viewport.height != scl_data->v_active &&
> dpp->caps->dscl_data_proc_format == 
> DSCL_DATA_PRCESSING_FIXED_FORMAT &&
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c 
> b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c
> index 61bcfa03c4e7..1aeb04fbd89d 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c
> @@ -541,8 +541,6 @@ void dcn201_pipe_control_lock(
> bool lock)
>  {
> struct dce_hwseq *hws = dc->hwseq;
> -   struct hubp *hubp = NULL;
> -   hubp = dc->res_pool->hubps[pipe->pipe_idx];
> /* use TG master update lock to lock everything on the TG
>  * therefore only top pipe need to lock
>  */
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_afmt.c 
> b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_afmt.c
> index 95528e5ef89e..55e388c4c98b 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_afmt.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_afmt.c
> @@ -123,7 +123,6 @@ void afmt3_se_audio_setup(
>  {
> struct dcn30_afmt *afmt3 = DCN30_AFMT_FROM_AFMT(afmt);
>
> -   uint32_t speakers = 0;
> uint32_t channels = 0;
>
> ASSERT(audio_info);
> @@ -131,7 +130,6 @@ void afmt3_se_audio_setup(
> if (audio_info == NULL)
> return;
>
> -   speakers = audio_info->flags.info.ALLSPEAKERS;
> channels = speakers_to_channels(audio_info->flags.speaker_flags).all;
>
> /* setup the audio stream source select (audio -> dig mapping) */
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hubp.c 
> b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hubp.c
> index dc3e8df706b3..e46bbe7ddcc9 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hubp.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hubp.c
> @@ -47,13 +47,9 @@ void hubp3_set_vm_system_aperture_settings(struct hubp 
> *hubp,
>  {
> struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp);
>
> -   PHYSICAL_ADDRESS_LOC mc_vm_apt_default;
> PHYSICAL_ADDRESS_LOC mc_vm_apt_low;
> PHYSICAL_ADDRESS_LOC mc_vm_apt_high;
>
> -   // The format of defaul

Re: [PATCH v2 1/3] drm/amd/display: Fix implicit enum conversion

2023-02-23 Thread Alex Deucher
This patch doesn't apply.  Please make sure you are using drm-next or
linux-next.

Alex

On Fri, Feb 17, 2023 at 1:15 PM Arthur Grillo  wrote:
>
> Make implicit enum conversion to avoid -Wenum-conversion warning, such
> as:
>
> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:4109:88:
>  warning: implicit conversion from ‘enum ’ to ‘enum 
> odm_combine_mode’ [-Wenum-conversion]
>  4109 | 
> locals->ODMCombineEnablePerState[i][k] = true;
>   |   
>  ^
>
> Signed-off-by: Arthur Grillo 
> ---
>  .../amd/display/dc/dml/dcn20/display_mode_vba_20.c   |  9 +
>  .../amd/display/dc/dml/dcn20/display_mode_vba_20v2.c | 11 ++-
>  .../amd/display/dc/dml/dcn21/display_mode_vba_21.c   | 12 ++--
>  3 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
> index d3b5b6fedf04..1b47249f01d8 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
> @@ -26,6 +26,7 @@
>  #include "../display_mode_lib.h"
>  #include "display_mode_vba_20.h"
>  #include "../dml_inline_defs.h"
> +#include "dml/display_mode_enums.h"
>
>  /*
>   * NOTE:
> @@ -3897,14 +3898,14 @@ void 
> dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l
> 
> mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine = 
> mode_lib->vba.PixelClock[k] / 2
> * (1 + 
> mode_lib->vba.DISPCLKDPPCLKDSCCLKDownSpreading / 100.0);
>
> -   locals->ODMCombineEnablePerState[i][k] = 
> false;
> +   locals->ODMCombineEnablePerState[i][k] = 
> dm_odm_combine_mode_disabled;
> mode_lib->vba.PlaneRequiredDISPCLK = 
> mode_lib->vba.PlaneRequiredDISPCLKWithoutODMCombine;
> if (mode_lib->vba.ODMCapability) {
> if 
> (locals->PlaneRequiredDISPCLKWithoutODMCombine > 
> mode_lib->vba.MaxDispclkRoundedDownToDFSGranularity) {
> -   
> locals->ODMCombineEnablePerState[i][k] = true;
> +   
> locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
> 
> mode_lib->vba.PlaneRequiredDISPCLK = 
> mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;
> } else if (locals->HActive[k] > 
> DCN20_MAX_420_IMAGE_WIDTH && locals->OutputFormat[k] == dm_420) {
> -   
> locals->ODMCombineEnablePerState[i][k] = true;
> +   
> locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
> 
> mode_lib->vba.PlaneRequiredDISPCLK = 
> mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;
> }
> }
> @@ -3957,7 +3958,7 @@ void dml20_ModeSupportAndSystemConfigurationFull(struct 
> display_mode_lib *mode_l
> locals->RequiredDISPCLK[i][j] = 0.0;
> locals->DISPCLK_DPPCLK_Support[i][j] = true;
> for (k = 0; k <= 
> mode_lib->vba.NumberOfActivePlanes - 1; k++) {
> -   
> locals->ODMCombineEnablePerState[i][k] = false;
> +   
> locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_disabled;
> if (locals->SwathWidthYSingleDPP[k] 
> <= locals->MaximumSwathWidth[k]) {
> locals->NoOfDPP[i][j][k] = 1;
> 
> locals->RequiredDPPCLK[i][j][k] = locals->MinDPPCLKUsingSingleDPP[k]
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c
> index edd098c7eb92..4781bf82eec6 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c
> @@ -26,6 +26,7 @@
>  #include "../display_mode_lib.h"
>  #include "display_mode_vba_20v2.h"
>  #include "../dml_inline_defs.h"
> +#include "dml/display_mode_enums.h"
>
>  /*
>   * NOTE:
> @@ -4008,17 +4009,17 @@ void 
> dml20v2_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode
> 
> mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine = 
> mode_lib->vba.PixelClock[k] / 2
>   

RE: [PATCH] drm/amdgpu: support more AV1 encoding requests

2023-02-23 Thread Dong, Ruijing
[AMD Official Use Only - General]

Thanks Christian,

This is just to cover possible valid ways in kernel as a preparation, av1 
encoding in Mesa is still under developing.

Thanks,
Ruijing

-Original Message-
From: Christian König 
Sent: Thursday, February 23, 2023 1:48 AM
To: Wu, David ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian 
Cc: Deucher, Alexander ; Dong, Ruijing 
; Liu, Leo 
Subject: Re: [PATCH] drm/amdgpu: support more AV1 encoding requests

Am 23.02.23 um 00:11 schrieb David (Ming Qiang) Wu:
> Ensuring accurate IB package searching and covers more corners for AV1
> encoding requests.

That at least looks much cleaner now. Do we already have the Mesa patches ready 
which use this?

Regards,
Christian.

>
> Signed-off-by: David (Ming Qiang) Wu 
> Reviewed-by: Ruijing Dong 
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 81 +--
>   1 file changed, 75 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> index 22a41766a8c7..8235ff3820ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> @@ -1726,6 +1726,7 @@ static int vcn_v4_0_dec_msg(struct
> amdgpu_cs_parser *p, struct amdgpu_job *job,
>
>   #define RADEON_VCN_ENGINE_TYPE_ENCODE   (0x0002)
>   #define RADEON_VCN_ENGINE_TYPE_DECODE   (0x0003)
> +#define RADEON_VCN_ENGINE_TYPE_ENCODE_QUEUE  (0x0004)
>
>   #define RADEON_VCN_ENGINE_INFO  (0x3001)
>   #define RADEON_VCN_ENGINE_INFO_MAX_OFFSET   16
> @@ -1733,21 +1734,86 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser 
> *p, struct amdgpu_job *job,
>   #define RENCODE_ENCODE_STANDARD_AV1 2
>   #define RENCODE_IB_PARAM_SESSION_INIT   0x0003
>   #define RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET64
> +#define RENCODE_IB_ENC_QUE_INSTRUCTION   (0x3201)
> +#define RENCODE_IB_ENC_QUE_INSTRUCTION_MAX_OFFSET64
>
>   /* return the offset in ib if id is found, -1 otherwise
>* to speed up the searching we only search upto max_offset
>*/
> -static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t
> id, int max_offset)
> +static int vcn_v4_0_enc_find_ib_param(uint32_t *ptr, int size,
> +uint32_t id, int max_offset)
>   {
>   int i;
>
> - for (i = 0; i < ib->length_dw && i < max_offset && ib->ptr[i] >= 8; i 
> += ib->ptr[i]/4) {
> - if (ib->ptr[i + 1] == id)
> + for (i = 0; i < size && i < max_offset && ptr[i] >= 8; i += ptr[i] / 4) 
> {
> + if (ptr[i + 1] == id)
>   return i;
>   }
>   return -1;
>   }
>
> +static int vcn_v4_0_enc_queue_msg(struct amdgpu_cs_parser *p,
> +   struct amdgpu_job *job,
> +   struct amdgpu_ib *ib)
> +{
> + struct ttm_operation_ctx ctx = { false, false };
> + struct amdgpu_bo_va_mapping *map;
> + struct amdgpu_bo *bo;
> + uint64_t start, end;
> + int i;
> + void *ptr;
> + int r;
> + int data_size = 0;
> + uint64_t addr;
> + uint32_t *msg;
> +
> + i = vcn_v4_0_enc_find_ib_param(ib->ptr, ib->length_dw, 
> RENCODE_IB_ENC_QUE_INSTRUCTION,
> + RENCODE_IB_ENC_QUE_INSTRUCTION_MAX_OFFSET);
> + if (i >= 0) {
> + addr = ((uint64_t)ib->ptr[i + 3]) << 32 | ib->ptr[i + 2];
> + data_size = ib->ptr[i + 4];
> + }
> +
> + if (!data_size) /* did not find */
> + return 0;
> +
> + addr &= AMDGPU_GMC_HOLE_MASK;
> + r = amdgpu_cs_find_mapping(p, addr, &bo, &map);
> + if (r) {
> + DRM_ERROR("Can't find BO for addr 0x%08llx\n", addr);
> + return r;
> + }
> +
> + start = map->start * AMDGPU_GPU_PAGE_SIZE;
> + end = (map->last + 1) * AMDGPU_GPU_PAGE_SIZE;
> + if (addr & 0x7) {
> + DRM_ERROR("VCN messages must be 8 byte aligned!\n");
> + return -EINVAL;
> + }
> +
> + bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> + amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
> + r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> + if (r) {
> + DRM_ERROR("Failed validating the VCN message BO (%d)!\n", r);
> + return r;
> + }
> +
> + r = amdgpu_bo_kmap(bo, &ptr);
> + if (r) {
> + DRM_ERROR("Failed mapping the VCN message (%d)!\n", r);
> + return r;
> + }
> +
> + msg = ptr + addr - start; /* IB with SESSION_INIT */
> + i = vcn_v4_0_enc_find_ib_param(msg, data_size, 
> RENCODE_IB_PARAM_SESSION_INIT,
> + RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET);
> + if (i >= 0 && msg[i + 2] == RENCODE_ENCODE_STANDARD_AV1)
> + r = vcn_v4_0_limit_sched(p, job);
> +
> + amdgpu_bo_kunmap(bo);
> + return r;
> +}
> +
>   static int vcn_v4_0_ring_patch_cs_in_

[linux-next:master] BUILD REGRESSION 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f

2023-02-23 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f  Add linux-next specific 
files for 20230223

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202302062224.byzetxh1-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302111601.jty4lkra-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202302210350.lynwcl4t-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn30/dcn30_optc.c:294:6: warning: no 
previous prototype for 'optc3_wait_drr_doublebuffer_pending_clear' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_resource_helpers.c:62:18: 
warning: variable 'cursor_bpp' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:1199: warning: 
expecting prototype for dc_link_detect_connection_type(). Prototype was for 
link_detect_connection_type() instead
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1292:32:
 warning: variable 'result_write_min_hblank' set but not used 
[-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training.c:1586:38:
 warning: variable 'result' set but not used [-Wunused-but-set-variable]
include/asm-generic/div64.h:238:36: error: passing argument 1 of '__div64_32' 
from incompatible pointer type [-Werror=incompatible-pointer-types]

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used
|-- alpha-randconfig-r002-20230222
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used
|   `-- 
include-asm-generic-div64.h:error:passing-argument-of-__div64_32-from-incompatible-pointer-type
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used
|   `-- 
include-asm-generic-div64.h:error:passing-argument-of-__div64_32-from-incompatible-pointer-type
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used
|   `-- 
include-asm-generic-div64.h:error:passing-argument-of-__div64_32-from-incompatible-pointer-type
|-- arm64-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-dcn30-dcn30_optc.c:warning:no-previous-prototype-for-optc3_wait_drr_doublebuffer_pending_clear
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-dcn32-dcn32_resource_helpers.c:warning:variable-cursor_bpp-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_detection.c:warning:expecting-prototype-for-dc_link_detect_connection_type().-Prototype-was-for-link_detect_connection_type()-instead
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used

Re: [PATCH] drm/amdgpu: expose more memory stats in fdinfo

2023-02-23 Thread Marek Olšák
Updated patch attached.

Marek

On Mon, Feb 6, 2023 at 4:05 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Just two nit picks:
>
> +seq_printf(m, "drm-evicted-visible-vram:\t%llu KiB\n",
> +   stats.evicted_visible_vram/1024UL);
>
> For the values not standardized for all DRM drivers we might want to use
> amd as prefix here instead of drm.
>
> +uint64_t requested_gtt;/* how much userspace asked for */
>
> We used to have automated checkers complaining about comments after
> members.
>
> Kerneldoc complicent comments look like this:
>
>  /* @timestamp replaced by @rcu on dma_fence_release() */
>  struct rcu_head rcu;
>
> Apart from that looks good to me.
>
> Regards,
> Christian.
>
> Am 30.01.23 um 07:56 schrieb Marek Olšák:
> > Hi,
> >
> > This will be used for performance investigations. The patch is attached.
> >
> > Thanks,
> > Marek
>
>
From 3971ab629b17e15343ee428d32c3422f44c915bc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= 
Date: Mon, 30 Jan 2023 01:52:40 -0500
Subject: [PATCH] drm/amdgpu: expose more memory stats in fdinfo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This will be used for performance investigations.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 24 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 27 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 25 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 23 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  5 ++--
 5 files changed, 76 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 99a7855ab1bc..c57252f004e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -60,12 +60,13 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	struct amdgpu_fpriv *fpriv = file->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
 
-	uint64_t vram_mem = 0, gtt_mem = 0, cpu_mem = 0;
+	struct amdgpu_mem_stats stats;
 	ktime_t usage[AMDGPU_HW_IP_NUM];
 	uint32_t bus, dev, fn, domain;
 	unsigned int hw_ip;
 	int ret;
 
+	memset(&stats, 0, sizeof(stats));
 	bus = adev->pdev->bus->number;
 	domain = pci_domain_nr(adev->pdev->bus);
 	dev = PCI_SLOT(adev->pdev->devfn);
@@ -75,7 +76,7 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	if (ret)
 		return;
 
-	amdgpu_vm_get_memory(vm, &vram_mem, >t_mem, &cpu_mem);
+	amdgpu_vm_get_memory(vm, &stats);
 	amdgpu_bo_unreserve(vm->root.bo);
 
 	amdgpu_ctx_mgr_usage(&fpriv->ctx_mgr, usage);
@@ -90,9 +91,22 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
 	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
 	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
-	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
-	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
-	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
+	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
+	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
+	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
+	seq_printf(m, "amd-memory-visible-vram:\t%llu KiB\n",
+		   stats.visible_vram/1024UL);
+	seq_printf(m, "amd-evicted-vram:\t%llu KiB\n",
+		   stats.evicted_vram/1024UL);
+	seq_printf(m, "amd-evicted-visible-vram:\t%llu KiB\n",
+		   stats.evicted_visible_vram/1024UL);
+	seq_printf(m, "amd-requested-vram:\t%llu KiB\n",
+		   stats.requested_vram/1024UL);
+	seq_printf(m, "amd-requested-visible-vram:\t%llu KiB\n",
+		   stats.requested_visible_vram/1024UL);
+	seq_printf(m, "amd-requested-gtt:\t%llu KiB\n",
+		   stats.requested_gtt/1024UL);
+
 	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
 		if (!usage[hw_ip])
 			continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1c3e647400bd..2681e3582f75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1264,24 +1264,41 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 	trace_amdgpu_bo_move(abo, new_mem->mem_type, old_mem->mem_type);
 }
 
-void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
-uint64_t *gtt_mem, uint64_t *cpu_mem)
+void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
+			  struct amdgpu_mem_stats *stats)
 {
 	unsigned int domain;
+	uint64_t size = amdgpu_bo_size(bo);
 
 	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
 	switch (domain) {
 	case AMDGPU_GEM_DOMAIN_VRAM:
-		*vram_mem += amdgpu_bo_size(bo);
+		stats->vram += size;
+		if (amdgpu_bo_in_cpu_visible_vram(bo))
+			stats->visible_vram += size;
 		break;
 	case AMDGPU_GEM_DO

Re: [PATCH] drm/amd: use drm_kms_helper_poll_fini in amdgpu_device_suspend to avoid warning

2023-02-23 Thread Bert Karwatzki
When drm_kms_helper_poll_disable is used in amdgpu_device_suspend
without drm_kms_helper_poll_init having been called it causes a warning
in __flush_work:
https://gitlab.freedesktop.org/drm/amd/-/issues/2411
To avoid this one can use drm_kms_helper_poll_fini instead:
Send a second time because Evolution seems to have garbled the first
patch. 

>From 51cba3ae1e9f557cca8e37eb43b9b9310d0d504d Mon Sep 17 00:00:00 2001
From: Bert Karwatzki 
Date: Thu, 16 Feb 2023 10:34:11 +0100
Subject: [PATCH] Use drm_kms_helper_poll_fini instead of
 drm_kms_helper_poll_disable in amdgpu_device.c to avoid a warning from
 __flush_work.

Signed-off-by: Bert Karwatzki 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/drm_probe_helper.c | 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b325f7039e0e..dc9e9868a84b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4145,7 +4145,7 @@ int amdgpu_device_suspend(struct drm_device *dev,
bool fbcon)
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
DRM_WARN("smart shift update failed\n");
 
-   drm_kms_helper_poll_disable(dev);
+   drm_kms_helper_poll_fini(dev);
 
if (fbcon)
drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)-
> fb_helper, true);
diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 8127be134c39..105d00d5ebf3 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -842,9 +842,10 @@ EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
  *
  * This function disables the output polling work.
  *
- * Drivers can call this helper from their device suspend
implementation. It is
- * not an error to call this even when output polling isn't enabled or
already
- * disabled. Polling is re-enabled by calling
drm_kms_helper_poll_enable().
+ * Drivers can call this helper from their device suspend
implementation. If it
+ * is not known if drm_kms_helper_poll_init has been called before the
driver
+ * should use drm_kms_helper_poll_fini_instead.
+ * Polling is re-enabled by calling drm_kms_helper_poll_enable().
  *
  * Note that calls to enable and disable polling must be strictly
ordered, which
  * is automatically the case when they're only call from
suspend/resume





amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-23 Thread Mikhail Gavrilov
Hi,
I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But
it is impossible to use without AC power because the system losts nvme
when I disconnect the power adapter.

Messages from kernel log when it happens:
nvme nvme0: controller is down; will reset: CSTS=0x, PCI_STATUS=0x10
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"
and report a bug

I tried to use recommended parameters
(nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off) to resolve
this issue, but without successed.

In the linux-nvme mail list the last advice was to try the "pci=nocrs"
parameter.

But with this parameter the amdgpu driver refuses to work and makes
the system unbootable. I can solve the problem with the booting system
by blacklisting the driver but it is not a good solution, because I
don't wanna lose the GPU.

Why amdgpu not work with "pci=nocrs" ?
And is it possible to solve this incompatibility?
It is very important because when I boot the system without amdgpu
driver with "pci=nocrs" nvme is not losts when I disconnect the power
adapter. So "pci=nocrs" really helps.

Below that I see in kernel log when adds "pci=nocrs" parameter:

amdgpu :03:00.0: amdgpu: Fetched VBIOS from ATRM
amdgpu: ATOM BIOS: SWBRT77321.001
[drm] VCN(0) decode is enabled in VM mode
[drm] VCN(0) encode is enabled in VM mode
[drm] JPEG decode is enabled in VM mode
Console: switching to colour dummy device 80x25
amdgpu :03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature
disabled as experimental (default)
[drm] GPU posting now...
[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment
size is 9-bit
amdgpu :03:00.0: amdgpu: VRAM: 12272M 0x0080 -
0x0082FEFF (12272M used)
amdgpu :03:00.0: amdgpu: GART: 512M 0x - 0x1FFF
amdgpu :03:00.0: amdgpu: AGP: 267894784M 0x0084 -
0x
[drm] Detected VRAM RAM=12272M, BAR=16384M
[drm] RAM width 192bits GDDR6
[drm] amdgpu: 12272M of VRAM memory ready
[drm] amdgpu: 31774M of GTT memory ready.
amdgpu :03:00.0: amdgpu: (-14) failed to allocate kernel bo
[drm] Debug VRAM access will use slowpath MM access
amdgpu :03:00.0: amdgpu: Failed to DMA MAP the dummy page
[drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block
 failed -12
amdgpu :03:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu :03:00.0: amdgpu: Fatal error during GPU init
amdgpu :03:00.0: amdgpu: amdgpu: finishing device.

Of course a full system log is also attached.

-- 
Best Regards,
Mike Gavrilov.


system-log-Fatal-error-during-GPU-init.tar.xz
Description: application/xz


RE: [PATCH] drm/amd: use drm_kms_helper_poll_fini in amdgpu_device_suspend to avoid warning

2023-02-23 Thread Chen, Guchun
Hi Bert,

Thanks for your patch. As we will can drm_kms_helper_poll_enable in resume, so 
it may not make sense using drm_kms_helper_poll_fini in suspend, from code 
pairing perspective.

For your case, is it possible to fix the problem by limiting the access of 
drm_kms_helper_poll_disable with checking mode_config_initialized in adev 
structure? We can get rid of the code change in drm core in this way.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Bert 
Karwatzki
Sent: Friday, February 24, 2023 4:52 AM
To: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amd: use drm_kms_helper_poll_fini in 
amdgpu_device_suspend to avoid warning

When drm_kms_helper_poll_disable is used in amdgpu_device_suspend without 
drm_kms_helper_poll_init having been called it causes a warning in __flush_work:
https://gitlab.freedesktop.org/drm/amd/-/issues/2411
To avoid this one can use drm_kms_helper_poll_fini instead:
Send a second time because Evolution seems to have garbled the first patch. 

From 51cba3ae1e9f557cca8e37eb43b9b9310d0d504d Mon Sep 17 00:00:00 2001
From: Bert Karwatzki 
Date: Thu, 16 Feb 2023 10:34:11 +0100
Subject: [PATCH] Use drm_kms_helper_poll_fini instead of
 drm_kms_helper_poll_disable in amdgpu_device.c to avoid a warning from
 __flush_work.

Signed-off-by: Bert Karwatzki 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/drm_probe_helper.c | 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b325f7039e0e..dc9e9868a84b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4145,7 +4145,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
DRM_WARN("smart shift update failed\n");
 
-   drm_kms_helper_poll_disable(dev);
+   drm_kms_helper_poll_fini(dev);
 
if (fbcon)
drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)-
> fb_helper, true);
diff --git a/drivers/gpu/drm/drm_probe_helper.c
b/drivers/gpu/drm/drm_probe_helper.c
index 8127be134c39..105d00d5ebf3 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -842,9 +842,10 @@ EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
  *
  * This function disables the output polling work.
  *
- * Drivers can call this helper from their device suspend implementation. It is
- * not an error to call this even when output polling isn't enabled or already
- * disabled. Polling is re-enabled by calling drm_kms_helper_poll_enable().
+ * Drivers can call this helper from their device suspend
implementation. If it
+ * is not known if drm_kms_helper_poll_init has been called before the
driver
+ * should use drm_kms_helper_poll_fini_instead.
+ * Polling is re-enabled by calling drm_kms_helper_poll_enable().
  *
  * Note that calls to enable and disable polling must be strictly ordered, 
which
  * is automatically the case when they're only call from suspend/resume





[PATCH] drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable

2023-02-23 Thread Candice Li
Fixes following warnings:
warning: no previous prototype for 'umc_v8_10_convert_error_address'
warning: variable 'channel_index' set but not used

Reported-by: kernel test robot 
Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
index 66158219f791cb..fb55e8cb9967ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
@@ -209,10 +209,10 @@ static int umc_v8_10_swizzle_mode_na_to_pa(struct 
amdgpu_device *adev,
return 0;
 }
 
-void umc_v8_10_convert_error_address(struct amdgpu_device *adev,
-   struct ras_err_data *err_data, uint64_t 
err_addr,
-   uint32_t ch_inst, uint32_t umc_inst,
-   uint32_t node_inst, uint64_t mc_umc_status)
+static void umc_v8_10_convert_error_address(struct amdgpu_device *adev,
+   struct ras_err_data *err_data, 
uint64_t err_addr,
+   uint32_t ch_inst, uint32_t umc_inst,
+   uint32_t node_inst, uint64_t 
mc_umc_status)
 {
uint64_t na_err_addr_base;
uint64_t na_err_addr, retired_page_addr;
@@ -434,7 +434,7 @@ static void umc_v8_10_ecc_info_query_error_address(struct 
amdgpu_device *adev,
uint32_t umc_inst,
uint32_t node_inst)
 {
-   uint32_t eccinfo_table_idx, channel_index;
+   uint32_t eccinfo_table_idx;
uint64_t mc_umc_status, err_addr;
 
struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
@@ -443,11 +443,6 @@ static void umc_v8_10_ecc_info_query_error_address(struct 
amdgpu_device *adev,
  adev->umc.channel_inst_num +
  umc_inst * adev->umc.channel_inst_num +
  ch_inst;
-   channel_index =
-   adev->umc.channel_idx_tbl[node_inst * adev->umc.umc_inst_num *
- adev->umc.channel_inst_num +
- umc_inst * 
adev->umc.channel_inst_num +
- ch_inst];
 
mc_umc_status = ras->umc_ecc.ecc[eccinfo_table_idx].mca_umc_status;
 
-- 
2.17.1



RE: [PATCH] drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable

2023-02-23 Thread Zhou1, Tao
Reviewed-by: Tao Zhou 

> -Original Message-
> From: amd-gfx  On Behalf Of Candice
> Li
> Sent: Friday, February 24, 2023 12:25 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Li, Candice 
> Subject: [PATCH] drm/amdgpu: Make umc_v8_10_convert_error_address static
> and remove unused variable
> 
> Fixes following warnings:
> warning: no previous prototype for 'umc_v8_10_convert_error_address'
> warning: variable 'channel_index' set but not used
> 
> Reported-by: kernel test robot 
> Signed-off-by: Candice Li 
> ---
>  drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 15 +--
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> index 66158219f791cb..fb55e8cb9967ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> @@ -209,10 +209,10 @@ static int umc_v8_10_swizzle_mode_na_to_pa(struct
> amdgpu_device *adev,
>   return 0;
>  }
> 
> -void umc_v8_10_convert_error_address(struct amdgpu_device *adev,
> - struct ras_err_data *err_data, uint64_t
> err_addr,
> - uint32_t ch_inst, uint32_t umc_inst,
> - uint32_t node_inst, uint64_t mc_umc_status)
> +static void umc_v8_10_convert_error_address(struct amdgpu_device *adev,
> + struct ras_err_data *err_data,
> uint64_t err_addr,
> + uint32_t ch_inst, uint32_t umc_inst,
> + uint32_t node_inst, uint64_t
> mc_umc_status)
>  {
>   uint64_t na_err_addr_base;
>   uint64_t na_err_addr, retired_page_addr; @@ -434,7 +434,7 @@ static
> void umc_v8_10_ecc_info_query_error_address(struct amdgpu_device *adev,
>   uint32_t umc_inst,
>   uint32_t node_inst)
>  {
> - uint32_t eccinfo_table_idx, channel_index;
> + uint32_t eccinfo_table_idx;
>   uint64_t mc_umc_status, err_addr;
> 
>   struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); @@ -443,11
> +443,6 @@ static void umc_v8_10_ecc_info_query_error_address(struct
> amdgpu_device *adev,
> adev->umc.channel_inst_num +
> umc_inst * adev->umc.channel_inst_num +
> ch_inst;
> - channel_index =
> - adev->umc.channel_idx_tbl[node_inst * adev-
> >umc.umc_inst_num *
> -   adev->umc.channel_inst_num
> +
> -   umc_inst * adev-
> >umc.channel_inst_num +
> -   ch_inst];
> 
>   mc_umc_status = ras-
> >umc_ecc.ecc[eccinfo_table_idx].mca_umc_status;
> 
> --
> 2.17.1



Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-23 Thread Christian König

Hi Mikhail,

this is pretty clearly a problem with the system and/or it's BIOS and 
not the GPU hw or the driver.


The option pci=nocrs makes the kernel ignore additional resource windows 
the BIOS reports through ACPI. This then most likely leads to problems 
with amdgpu because it can't bring up its PCIe resources any more.


The output of "sudo lspci - -s $BUSID_OF_AMDGPU" might help 
understand the problem, but I strongly suggest to try a BIOS update first.


Regards,
Christian.

Am 24.02.23 um 00:40 schrieb Mikhail Gavrilov:

Hi,
I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But
it is impossible to use without AC power because the system losts nvme
when I disconnect the power adapter.

Messages from kernel log when it happens:
nvme nvme0: controller is down; will reset: CSTS=0x, PCI_STATUS=0x10
nvme nvme0: Does your device have a faulty power saving mode enabled?
nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"
and report a bug

I tried to use recommended parameters
(nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off) to resolve
this issue, but without successed.

In the linux-nvme mail list the last advice was to try the "pci=nocrs"
parameter.

But with this parameter the amdgpu driver refuses to work and makes
the system unbootable. I can solve the problem with the booting system
by blacklisting the driver but it is not a good solution, because I
don't wanna lose the GPU.

Why amdgpu not work with "pci=nocrs" ?
And is it possible to solve this incompatibility?
It is very important because when I boot the system without amdgpu
driver with "pci=nocrs" nvme is not losts when I disconnect the power
adapter. So "pci=nocrs" really helps.

Below that I see in kernel log when adds "pci=nocrs" parameter:

amdgpu :03:00.0: amdgpu: Fetched VBIOS from ATRM
amdgpu: ATOM BIOS: SWBRT77321.001
[drm] VCN(0) decode is enabled in VM mode
[drm] VCN(0) encode is enabled in VM mode
[drm] JPEG decode is enabled in VM mode
Console: switching to colour dummy device 80x25
amdgpu :03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature
disabled as experimental (default)
[drm] GPU posting now...
[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment
size is 9-bit
amdgpu :03:00.0: amdgpu: VRAM: 12272M 0x0080 -
0x0082FEFF (12272M used)
amdgpu :03:00.0: amdgpu: GART: 512M 0x - 0x1FFF
amdgpu :03:00.0: amdgpu: AGP: 267894784M 0x0084 -
0x
[drm] Detected VRAM RAM=12272M, BAR=16384M
[drm] RAM width 192bits GDDR6
[drm] amdgpu: 12272M of VRAM memory ready
[drm] amdgpu: 31774M of GTT memory ready.
amdgpu :03:00.0: amdgpu: (-14) failed to allocate kernel bo
[drm] Debug VRAM access will use slowpath MM access
amdgpu :03:00.0: amdgpu: Failed to DMA MAP the dummy page
[drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block
 failed -12
amdgpu :03:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu :03:00.0: amdgpu: Fatal error during GPU init
amdgpu :03:00.0: amdgpu: amdgpu: finishing device.

Of course a full system log is also attached.