[PATCH] drm/amd/display: Remove the repeated dpp1_full_bypass declaration

2021-06-17 Thread Shaokun Zhang
Function 'dpp1_full_bypass' is declared twice, so remove the repeated
declaration and unnessary blank line.

Cc: Harry Wentland 
Cc: Leo Li 
Cc: Alex Deucher 
Signed-off-by: Shaokun Zhang 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
index 9a1f40eb5c47..71b3a6949001 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h
@@ -1497,8 +1497,6 @@ void dpp1_cnv_setup (
enum dc_color_space input_color_space,
struct cnv_alpha_2bit_lut *alpha_2bit_lut);
 
-void dpp1_full_bypass(struct dpp *dpp_base);
-
 void dpp1_dppclk_control(
struct dpp *dpp_base,
bool dppclk_div,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Felix Kuehling
Am 2021-06-17 um 8:41 a.m. schrieb Pan, Xinhui:
> Felix
> What I am wondreing is that if CP got hang,  could we assume all usermode 
> queues have stopped?
> If so, we can do cleanupwork regardless of the retval of 
> execute_queues_cpsch().

Right. That's what we currently do with ETIME, which happens when the
hang is first detected. I don't know why we need to treat the case
differently when we're already in a reset.

Regards,
  Felix


>
>> 2021年6月17日 20:11,Pan, Xinhui  写道:
>>
>> Felix
>> what I am thinking of like below looks like more simple. :)
>>
>> @@ -1501,6 +1501,11 @@ static int destroy_queue_cpsch(struct 
>> device_queue_manager *dqm,
>>/* remove queue from list to prevent rescheduling after preemption */
>>dqm_lock(dqm);
>>
>> +   if (dqm->is_hws_hang) {
>> +   retval = -EIO;
>> +   goto failed_try_destroy_debugged_queue;
>> +   }
>> +
>>if (qpd->is_debug) {
>>/*
>> * error, currently we do not allow to destroy a queue
>>
>>> 2021年6月17日 20:02,Pan, Xinhui  写道:
>>>
>>> Handle queue destroy failure while CP hang.
>>> Once CP got hang, kfd trigger GPU reset and set related flags to stop
>>> driver touching the queue. As we leave the queue as it is, we need keep
>>> the resource as it is too.
>>>
>>> Regardless user-space tries to destroy the queue again or not. We need
>>> put queue back to the list so process termination would do the cleanup
>>> work. What's more, if userspace tries to destroy the queue again, we
>>> would not free its resource twice.
>>>
>>> Kfd return -EIO in this case, so lets handle it now.
>>>
>>> Paste some error log below without this patch.
>>>
>>> amdgpu: Can't create new usermode queue because -1 queues were already
>>> created
>>>
>>> refcount_t: underflow; use-after-free.
>>> Call Trace:
>>> kobject_put+0xe6/0x1b0
>>> kfd_procfs_del_queue+0x37/0x50 [amdgpu]
>>> pqm_destroy_queue+0x17a/0x390 [amdgpu]
>>> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
>>> kfd_ioctl+0x463/0x690 [amdgpu]
>>>
>>> BUG kmalloc-32 (Tainted: GW): Object already free
>>> INFO: Allocated in allocate_sdma_mqd+0x30/0xb0 [amdgpu] age=4796 cpu=2
>>> pid=2511
>>> __slab_alloc+0x72/0x80
>>> kmem_cache_alloc_trace+0x81f/0x8c0
>>> allocate_sdma_mqd+0x30/0xb0 [amdgpu]
>>> create_queue_cpsch+0xbf/0x470 [amdgpu]
>>> pqm_create_queue+0x28d/0x6d0 [amdgpu]
>>> kfd_ioctl_create_queue+0x492/0xae0 [amdgpu]
>>> INFO: Freed in free_mqd_hiq_sdma+0x20/0x60 [amdgpu] age=2537 cpu=7
>>> pid=2511
>>> kfree+0x322/0x340
>>> free_mqd_hiq_sdma+0x20/0x60 [amdgpu]
>>> destroy_queue_cpsch+0x20c/0x330 [amdgpu]
>>> pqm_destroy_queue+0x1a3/0x390 [amdgpu]
>>> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
>>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   | 13 +
>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c|  4 +++-
>>> .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  2 ++
>>> 3 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index c069fa259b30..63a9a19a3987 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -1530,6 +1530,11 @@ static int destroy_queue_cpsch(struct 
>>> device_queue_manager *dqm,
>>> KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
>>> if (retval == -ETIME)
>>> qpd->reset_wavefronts = true;
>>> +   /* In gpu reset? We leave the queue as it is, so do NOT
>>> +* cleanup the resource.
>>> +*/
>>> +   else if (retval == -EIO)
>>> +   goto failed_execute_queue;
>>> if (q->properties.is_gws) {
>>> dqm->gws_queue_count--;
>>> qpd->mapped_gws_queue = false;
>>> @@ -1551,6 +1556,14 @@ static int destroy_queue_cpsch(struct 
>>> device_queue_manager *dqm,
>>>
>>> return retval;
>>>
>>> +failed_execute_queue:
>>> +   /* Put queue back to the list, then we have chance to destroy it.
>>> +* FIXME: we do NOT want the queue in the runlist again.
>>> +*/
>>> +   list_add(>list, >queues_list);
>>> +   qpd->queue_count++;
>>> +   if (q->properties.is_active)
>>> +   increment_queue_count(dqm, q->properties.type);
>>> failed_try_destroy_debugged_queue:
>>>
>>> dqm_unlock(dqm);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index 09b98a83f670..984197e5929f 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -607,11 +607,13 @@ static int kfd_procfs_add_sysfs_files(struct 
>>> kfd_process *p)
>>>
>>> void kfd_procfs_del_queue(struct queue *q)
>>> {
>>> -   if (!q)
>>> +   if (!q || !kobject_get_unless_zero(>kobj))
>>> 

Re: [PATCH v2 2/2] drm/amdkfd: Walk through list with dqm lock hold

2021-06-17 Thread Felix Kuehling
Am 2021-06-17 um 8:02 a.m. schrieb xinhui pan:
> To avoid any list corruption.
>
> Signed-off-by: xinhui pan 

This patch is

Reviewed-by: Felix Kuehling 


> ---
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 ++-
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 63a9a19a3987..d62374746c93 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1722,7 +1722,7 @@ static int process_termination_cpsch(struct 
> device_queue_manager *dqm,
>   struct qcm_process_device *qpd)
>  {
>   int retval;
> - struct queue *q, *next;
> + struct queue *q;
>   struct kernel_queue *kq, *kq_next;
>   struct mqd_manager *mqd_mgr;
>   struct device_process_node *cur, *next_dpn;
> @@ -1779,24 +1779,26 @@ static int process_termination_cpsch(struct 
> device_queue_manager *dqm,
>   qpd->reset_wavefronts = false;
>   }
>  
> - dqm_unlock(dqm);
> -
> - /* Outside the DQM lock because under the DQM lock we can't do
> -  * reclaim or take other locks that others hold while reclaiming.
> -  */
> - if (found)
> - kfd_dec_compute_active(dqm->dev);
> -
>   /* Lastly, free mqd resources.
>* Do free_mqd() after dqm_unlock to avoid circular locking.
>*/
> - list_for_each_entry_safe(q, next, >queues_list, list) {
> + while (!list_empty(>queues_list)) {
> + q = list_first_entry(>queues_list, struct queue, list);
>   mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   q->properties.type)];
>   list_del(>list);
>   qpd->queue_count--;
> + dqm_unlock(dqm);
>   mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
> + dqm_lock(dqm);
>   }
> + dqm_unlock(dqm);
> +
> + /* Outside the DQM lock because under the DQM lock we can't do
> +  * reclaim or take other locks that others hold while reclaiming.
> +  */
> + if (found)
> + kfd_dec_compute_active(dqm->dev);
>  
>   return retval;
>  }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 1/8] ext4/xfs: add page refcount helper

2021-06-17 Thread Dave Chinner
On Thu, Jun 17, 2021 at 10:16:58AM -0500, Alex Sierra wrote:
> From: Ralph Campbell 
> 
> There are several places where ZONE_DEVICE struct pages assume a reference
> count == 1 means the page is idle and free. Instead of open coding this,
> add a helper function to hide this detail.
> 
> v2:
> [AS]: rename dax_layout_is_idle_page func to dax_page_unused

Did you even compile test this?

> Signed-off-by: Ralph Campbell 
> Signed-off-by: Alex Sierra 
> ---
>  fs/dax.c|  4 ++--
>  fs/ext4/inode.c |  5 +
>  fs/xfs/xfs_file.c   |  4 +---
>  include/linux/dax.h | 10 ++
>  4 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 26d5dcd2d69e..321f4ddc6643 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
> address_space *mapping,
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> + WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));

Because you still use dax_layout_is_idle_page() here, not
dax_page_unused()...

>   WARN_ON_ONCE(page->mapping && page->mapping != mapping);
>   page->mapping = NULL;
>   page->index = 0;
> @@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - if (page_ref_count(page) > 1)
> + if (!dax_layout_is_idle_page(page))

Here too.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amd/display: Fix overlay validation by considering cursors"

2021-06-17 Thread Harry Wentland
On 2021-06-16 1:14 p.m., Sean Paul wrote:
> On Wed, Jun 16, 2021 at 12:21 PM Rodrigo Siqueira
>  wrote:
>>
>> This reverts commit 04cc17a951f73f9a9092ca572b063e6292aeb085.
>>
>> The patch that we are reverting here was originally applied because it
>> fixes multiple IGT issues and flickering in Android. However, after a
>> discussion with Sean Paul and Mark, it looks like that this patch might
>> cause problems on ChromeOS. For this reason, we decided to revert this
>> patch.
> 
> Thanks for sending this, Siqueira!
> 
> To be clear for those unfamiliar, the issue extends beyond ChromeOS
> (we're not just pushing our compositor problems on the rest of the
> community).
> 
> Relying on cursor enable/disable for atomic creates non-deterministic
> behavior which would be very hard for any compositor to reason out
> without knowing the hardware-specific limitations. The case I'm
> worried about is that the compositor has an overlay active without the
> cursor and at some point the compositor enables the cursor which will
> fail because of the overlay.
> 

Previous discussion highlighted that the cursor IOCTL should never
fail and that userspace generally is not well equipped to deal with
it if it does.

https://patchwork.freedesktop.org/patch/387230/
https://patchwork.freedesktop.org/patch/389084/

Reviewed-by: Harry Wentland 

Harry

> Reviewed-by: Sean Paul 
> 
>>
>> Cc: Nicholas Kazlauskas 
>> Cc: Harry Wentland 
>> Cc: Hersen Wu 
>> Cc: Sean Paul 
>> Cc: Mark Yacoub 
>> Cc: Greg Kroah-Hartman 
>> Signed-off-by: Rodrigo Siqueira 
>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++--
>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index 8358112b5822..3fd41e098c90 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -10200,8 +10200,8 @@ static int validate_overlay(struct drm_atomic_state 
>> *state)
>>  {
>> int i;
>> struct drm_plane *plane;
>> -   struct drm_plane_state *new_plane_state;
>> -   struct drm_plane_state *primary_state, *cursor_state, *overlay_state 
>> = NULL;
>> +   struct drm_plane_state *old_plane_state, *new_plane_state;
>> +   struct drm_plane_state *primary_state, *overlay_state = NULL;
>>
>> /* Check if primary plane is contained inside overlay */
>> for_each_new_plane_in_state_reverse(state, plane, new_plane_state, 
>> i) {
>> @@ -10231,14 +10231,6 @@ static int validate_overlay(struct drm_atomic_state 
>> *state)
>> if (!primary_state->crtc)
>> return 0;
>>
>> -   /* check if cursor plane is enabled */
>> -   cursor_state = drm_atomic_get_plane_state(state, 
>> overlay_state->crtc->cursor);
>> -   if (IS_ERR(cursor_state))
>> -   return PTR_ERR(cursor_state);
>> -
>> -   if (drm_atomic_plane_disabling(plane->state, cursor_state))
>> -   return 0;
>> -
>> /* Perform the bounds check to ensure the overlay plane covers the 
>> primary */
>> if (primary_state->crtc_x < overlay_state->crtc_x ||
>> primary_state->crtc_y < overlay_state->crtc_y ||
>> --
>> 2.25.1
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr 2/3] Generalize decoding of PDEs and PTEs in AI+

2021-06-17 Thread Joseph Greathouse
Brings decoding of PDEs and PTEs for AI+ chips into their own
functions, so that we don't end up with subtly different decoding
bugs in the variety of places such decodings are done.

Also fixes a minor bug where we were pulling PTE.PRT from bit 61
instead of the proper bit 51.

Signed-off-by: Joseph Greathouse 
---
 src/lib/read_vram.c | 187 ++--
 1 file changed, 109 insertions(+), 78 deletions(-)

diff --git a/src/lib/read_vram.c b/src/lib/read_vram.c
index 049acd4..2998873 100644
--- a/src/lib/read_vram.c
+++ b/src/lib/read_vram.c
@@ -317,6 +317,104 @@ static uint64_t log2_vm_size(uint64_t 
page_table_start_addr, uint64_t page_table
return vm_bits;
 }
 
+typedef struct {
+   uint64_t
+   frag_size,
+   pte_base_addr,
+   valid,
+   system,
+   coherent,
+   pte,
+   further;
+} pde_fields_ai_t;
+
+typedef struct {
+   uint64_t
+   valid,
+   system,
+   coherent,
+   tmz,
+   execute,
+   read,
+   write,
+   fragment,
+   page_base_addr,
+   prt,
+   pde,
+   further,
+   mtype;
+} pte_fields_ai_t;
+
+/*
+ * PDE format on AI:
+ * 63:59 block fragment size
+ * 58:55 reserved
+ *   But if bit 56 is set, this is a PTE with 'further' set,
+ *   which makes it act like a PDE.
+ * 54 pde-is-pte
+ * 53:48 reserved
+ * 47:6 physical base address of PTE
+ * 2 cache coherent/snoop
+ * 1 system
+ * 0 valid
+ */
+static pde_fields_ai_t decode_pde_entry_ai(uint64_t pde_entry)
+{
+   pde_fields_ai_t pde_fields;
+   pde_fields.frag_size = (pde_entry >> 59) & 0x1F;
+   pde_fields.pte_base_addr = pde_entry & 0xFFC0ULL;
+   pde_fields.valid = pde_entry & 1;
+   pde_fields.system= (pde_entry >> 1) & 1;
+   pde_fields.coherent  = (pde_entry >> 2) & 1;
+   pde_fields.pte   = (pde_entry >> 54) & 1;
+   pde_fields.further   = (pde_entry >> 56) & 1;
+   return pde_fields;
+}
+
+/*
+ * PTE format on AI and PI:
+ * 58:57 mtype
+ * 56 further
+ * 54 reserved
+ *   But if it is set, then this is actually a PDE with 'P'
+ *   bit set, which makes the PDE act like a PTE.
+ * 51 prt
+ * 47:12 4k physical page base address
+ * 11:7 fragment
+ * 6 write
+ * 5 read
+ * 4 exe
+ * 3 tmz (PI+)
+ * 2 snooped / coherent
+ * 1 system
+ * 0 valid
+ */
+static pte_fields_ai_t decode_pte_entry_ai(uint64_t pte_entry)
+{
+   pte_fields_ai_t pte_fields;
+   pte_fields.valid  = pte_entry & 1;
+   pte_fields.system = (pte_entry >> 1) & 1;
+   pte_fields.coherent   = (pte_entry >> 2) & 1;
+   pte_fields.tmz= (pte_entry >> 3) & 1;
+   pte_fields.execute= (pte_entry >> 4) & 1;
+   pte_fields.read   = (pte_entry >> 5) & 1;
+   pte_fields.write  = (pte_entry >> 6) & 1;
+   pte_fields.fragment   = (pte_entry >> 7) & 0x1F;
+   pte_fields.prt= (pte_entry >> 51) & 1;
+   pte_fields.pde= (pte_entry >> 54) & 1;
+   pte_fields.further= (pte_entry >> 56) & 1;
+   pte_fields.mtype  = (pte_entry >> 57) & 3;
+
+   // PTEs hold physical address in 47:12
+   // PDEs hold physical address in 47:6, so if this is a PTE-as-PDE 
(further), need a differnt mask
+   if (pte_fields.further)
+   pte_fields.page_base_addr = pte_entry & 0xFFC0ULL;
+   else
+   pte_fields.page_base_addr = pte_entry & 0xF000ULL;
+
+   return pte_fields;
+}
+
 /**
  * umr_access_vram_ai - Access GPU mapped memory for GFX9+ platforms
  */
@@ -352,24 +450,9 @@ static int umr_access_vram_ai(struct umr_asic *asic, 
uint32_t vmid,
mmMC_VM_AGP_BOT,
mmMC_VM_AGP_TOP;
} registers;
-   struct {
-   uint64_t
-   frag_size,
-   pte_base_addr,
-   valid,
-   system,
-   cache,
-   pte;
-   } pde_fields, pde_array[8];
-   struct {
-   uint64_t
-   page_base_addr,
-   fragment,
-   system,
-   valid,
-   prt,
-   further;
-   } pte_fields;
+
+   pde_fields_ai_t pde_fields, pde_array[8];
+   pte_fields_ai_t pte_fields;
char buf[64];
unsigned char *pdst = dst;
char *hub, *vm0prefix, *regprefix;
@@ -379,27 +462,6 @@ static int umr_access_vram_ai(struct umr_asic *asic, 
uint32_t vmid,
memset(, 0, sizeof registers);
memset(_array, 0xff, sizeof pde_array);
 
-   /*
-* PTE format on AI:
-* 47:12 4k physical page base address
-* 11:7 

[PATCH umr 3/3] Enhance printing of page tables in AI+

2021-06-17 Thread Joseph Greathouse
Pulls print functions for GPUVM page tables on AI+ chips into their
own set of generalized functions, so that we don't have subtly
different printouts for different layers.

Explicitly prints PDEs with P bit (which makes it a PTE) and makes
the PTE with F bit set (further, which makes it a PDE) properly
indent the next layer of the print.

Prints remaining fields from the PTE and PDE printouts, such as
read/write/execute bits and MTYPE from PTE.

Signed-off-by: Joseph Greathouse 
---
 src/lib/read_vram.c | 184 ++--
 1 file changed, 127 insertions(+), 57 deletions(-)

diff --git a/src/lib/read_vram.c b/src/lib/read_vram.c
index 2998873..cb38b60 100644
--- a/src/lib/read_vram.c
+++ b/src/lib/read_vram.c
@@ -415,6 +415,112 @@ static pte_fields_ai_t decode_pte_entry_ai(uint64_t 
pte_entry)
return pte_fields;
 }
 
+static void print_pde_fields_ai(struct umr_asic *asic,
+   pde_fields_ai_t pde_fields)
+{
+   asic->mem_funcs.vm_message(
+   ", PBA==0x%012" PRIx64 ", V=%" PRIu64
+   ", S=%" PRIu64 ", C=%" PRIu64
+   ", P=%" PRIu64 ", FS=%" PRIu64 "\n",
+   pde_fields.pte_base_addr,
+   pde_fields.valid,
+   pde_fields.system,
+   pde_fields.coherent,
+   pde_fields.pte,
+   pde_fields.frag_size);
+}
+static void print_base_ai(struct umr_asic *asic,
+ uint64_t pde_entry, uint64_t address,
+ uint64_t va_mask, pde_fields_ai_t pde_fields,
+ int is_base_not_pde)
+{
+   if (is_base_not_pde)
+   asic->mem_funcs.vm_message("BASE");
+   else
+   asic->mem_funcs.vm_message("PDE");
+   asic->mem_funcs.vm_message("=0x%016" PRIx64 ", VA=0x%012" PRIx64,
+   pde_entry,
+   address & va_mask);
+   print_pde_fields_ai(asic, pde_fields);
+}
+
+static void print_pde_ai(struct umr_asic *asic,
+   const char * indentation, int pde_cnt,
+   int page_table_depth, uint64_t prev_addr,
+   uint64_t pde_idx, uint64_t pde_entry, uint64_t address,
+   uint64_t va_mask, pde_fields_ai_t pde_fields)
+{
+   asic->mem_funcs.vm_message("%s ", [18-pde_cnt*3]);
+   if (pde_fields.further)
+   asic->mem_funcs.vm_message("PTE-FURTHER");
+   else
+   asic->mem_funcs.vm_message("PDE%d", page_table_depth - pde_cnt);
+
+   asic->mem_funcs.vm_message("@{0x%" PRIx64 "/%" PRIx64
+   "}=0x%016" PRIx64 ", VA=0x%012" PRIx64,
+   prev_addr,
+   pde_idx,
+   pde_entry,
+   address & va_mask);
+   print_pde_fields_ai(asic, pde_fields);
+}
+
+static void print_pte_ai(struct umr_asic *asic,
+   const char * indentation, int pde_cnt, uint64_t prev_addr,
+   uint64_t pte_idx, uint64_t pte_entry, uint64_t address,
+   uint64_t va_mask, pte_fields_ai_t pte_fields)
+{
+   if (asic == NULL) {
+   asic->mem_funcs.vm_message("\\-> PTE");
+   } else {
+   asic->mem_funcs.vm_message("%s ",
+   [18-pde_cnt*3]);
+   if (pte_fields.pde)
+   asic->mem_funcs.vm_message("PDE0-as-PTE");
+   else
+   asic->mem_funcs.vm_message("PTE");
+   asic->mem_funcs.vm_message("@{0x%" PRIx64 "/%" PRIx64"}",
+   prev_addr,
+   pte_idx);
+   }
+   asic->mem_funcs.vm_message("=0x%016" PRIx64 ", VA=0x%012" PRIx64
+   ", PBA==0x%012" PRIx64 ", V=%" PRIu64
+   ", S=%" PRIu64 ", C=%" PRIu64 ", Z=%" PRIu64
+   ", X=%" PRIu64 ", R=%" PRIu64 ", W=%" PRIu64
+   ", FS=%" PRIu64 ", T=%" PRIu64 ", MTYPE=",
+   pte_entry,
+   address & va_mask,
+   pte_fields.page_base_addr,
+   pte_fields.valid,
+   pte_fields.system,
+   pte_fields.coherent,
+   pte_fields.tmz,
+   pte_fields.execute,
+   pte_fields.read,
+   pte_fields.write,
+   pte_fields.fragment,
+   pte_fields.prt,
+   pte_fields.mtype);
+   switch (pte_fields.mtype) {
+   case 0:
+   asic->mem_funcs.vm_message("NC\n");
+   break;
+   case 1:
+   asic->mem_funcs.vm_message("RW\n");
+   break;
+   case 2:
+   asic->mem_funcs.vm_message("CC\n");
+  

Re: [PATCH -next] drm/amd/display: remove unused variable 'dc'

2021-06-17 Thread Harry Wentland


On 2021-06-16 9:16 p.m., Pu Lehui wrote:
> GCC reports the following warning with W=1:
> 
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_psr.c:70:13:
> warning:
>  variable ‘dc’ set but not used [-Wunused-but-set-variable]
> 70 |  struct dc *dc = NULL;
>| ^~
> 
> This variable is not used in function, this commit remove it to
> fix the warning.
> 
> Signed-off-by: Pu Lehui 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> index f7c77ae0d965..70a554f1e725 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> @@ -67,14 +67,12 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state 
> *stream)
>   struct dc_link *link = NULL;
>   struct psr_config psr_config = {0};
>   struct psr_context psr_context = {0};
> - struct dc *dc = NULL;
>   bool ret = false;
>  
>   if (stream == NULL)
>   return false;
>  
>   link = stream->link;
> - dc = link->ctx->dc;
>  
>   psr_config.psr_version = link->dpcd_caps.psr_caps.psr_version;
>  
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH -next] drm/amd/display: Fix gcc unused variable warning

2021-06-17 Thread Harry Wentland
On 2021-06-16 10:31 p.m., Pu Lehui wrote:
> GCC reports the following warning with W=1:
> 
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:3635:17:
> warning:
>  variable ‘status’ set but not used [-Wunused-but-set-variable]
>   3635 |  enum dc_status status = DC_ERROR_UNEXPECTED;
>| ^~
> 
> The variable should be used for error check, let's fix it.
> 
> Signed-off-by: Pu Lehui 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index fcb635c85330..cf29265870c8 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -3681,6 +3681,10 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)
>   
> DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV,
>   lttpr_dpcd_data,
>   sizeof(lttpr_dpcd_data));
> + if (status != DC_OK) {
> + dm_error("%s: Read LTTPR caps data failed.\n", 
> __func__);
> + return false;
> + }
>  
>   link->dpcd_caps.lttpr_caps.revision.raw =
>   
> lttpr_dpcd_data[DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV -
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3] drm/amd/amdgpu: Use IP discovery data to determine VCN enablement instead of MMSCH

2021-06-17 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Peng Ju Zhou 

Sent: Thursday, June 17, 2021 3:46 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Zhang, Bokun 
Subject: [PATCH v3] drm/amd/amdgpu: Use IP discovery data to determine VCN 
enablement instead of MMSCH

From: Bokun Zhang 

In the past, we use MMSCH to determine whether a VCN is enabled or not.
This is not reliable since after a FLR, MMSCH may report junk data.

It is better to use IP discovery data.

Signed-off-by: Bokun Zhang 
Signed-off-by: Peng Ju Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  8 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 23 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   | 13 +
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 53 +--
 5 files changed, 61 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index f949ed8bfd9e..e02405a24fe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -373,6 +373,14 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
 return -EINVAL;
 }

+
+int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
+int *major, int *minor, int *revision)
+{
+   return amdgpu_discovery_get_ip_version(adev, VCN_HWID,
+  vcn_instance, major, minor, 
revision);
+}
+
 void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
 {
 struct binary_header *bhdr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
index 02e340cd3a38..48e6b88cfdfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
@@ -32,6 +32,9 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev);
 void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev);
 int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int 
number_instance,
 int *major, int *minor, int *revision);
+
+int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
+int *major, int *minor, int *revision);
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev);

 #endif /* __AMDGPU_DISCOVERY__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 9492b505e69b..84b025405578 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -287,6 +287,29 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev)
 return 0;
 }

+bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, enum vcn_ring_type 
type, uint32_t vcn_instance)
+{
+   bool ret = false;
+
+   int major;
+   int minor;
+   int revision;
+
+   /* if cannot find IP data, then this VCN does not exist */
+   if (amdgpu_discovery_get_vcn_version(adev, vcn_instance, , 
, ) != 0)
+   return true;
+
+   if ((type == VCN_ENCODE_RING) && (revision & 
VCN_BLOCK_ENCODE_DISABLE_MASK)) {
+   ret = true;
+   } else if ((type == VCN_DECODE_RING) && (revision & 
VCN_BLOCK_DECODE_DISABLE_MASK)) {
+   ret = true;
+   } else if ((type == VCN_UNIFIED_RING) && (revision & 
VCN_BLOCK_QUEUE_DISABLE_MASK)) {
+   ret = true;
+   }
+
+   return ret;
+}
+
 int amdgpu_vcn_suspend(struct amdgpu_device *adev)
 {
 unsigned size;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index bc76cab67697..d74c62b49795 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -280,6 +280,16 @@ struct amdgpu_vcn_decode_buffer {
 uint32_t pad[30];
 };

+#define VCN_BLOCK_ENCODE_DISABLE_MASK 0x80
+#define VCN_BLOCK_DECODE_DISABLE_MASK 0x40
+#define VCN_BLOCK_QUEUE_DISABLE_MASK 0xC0
+
+enum vcn_ring_type {
+   VCN_ENCODE_RING,
+   VCN_DECODE_RING,
+   VCN_UNIFIED_RING,
+};
+
 int amdgpu_vcn_sw_init(struct amdgpu_device *adev);
 int amdgpu_vcn_sw_fini(struct amdgpu_device *adev);
 int amdgpu_vcn_suspend(struct amdgpu_device *adev);
@@ -287,6 +297,9 @@ int amdgpu_vcn_resume(struct amdgpu_device *adev);
 void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring);
 void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring);

+bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev,
+   enum vcn_ring_type type, uint32_t vcn_instance);
+
 int amdgpu_vcn_dec_ring_test_ring(struct amdgpu_ring *ring);
 int amdgpu_vcn_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout);
 int amdgpu_vcn_dec_sw_ring_test_ring(struct amdgpu_ring *ring);
diff 

Re: [PATCH 2/2] drm/amdgpu: rework dma_resv handling v3

2021-06-17 Thread Alex Deucher
On Mon, Jun 14, 2021 at 1:45 PM Christian König
 wrote:
>
> Drop the workaround and instead implement a better solution.
>
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
>
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
>
> v3: add the shared fence first before the exclusive one
>
> Signed-off-by: Christian König 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 62 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 55 insertions(+), 79 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
> struct ttm_validate_buffer  tv;
> struct amdgpu_bo_va *bo_va;
> +   struct dma_fence_chain  *chain;
> uint32_tpriority;
> struct page **user_pages;
> booluser_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9ce649a1a8d3..25655414e9c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -572,6 +572,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> goto out;
> }
>
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> +   e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> +   if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> +   e->chain = dma_fence_chain_alloc();
> +   if (!e->chain) {
> +   r = -ENOMEM;
> +   goto error_validate;
> +   }
> +   }
> +   }
> +
> amdgpu_cs_get_threshold_for_moves(p->adev, >bytes_moved_threshold,
>   >bytes_moved_vis_threshold);
> p->bytes_moved = 0;
> @@ -599,15 +613,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> gws = p->bo_list->gws_obj;
> oa = p->bo_list->oa_obj;
>
> -   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> -   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> -   /* Make sure we use the exclusive slot for shared BOs */
> -   if (bo->prime_shared_count)
> -   e->tv.num_shared = 0;
> -   e->bo_va = amdgpu_vm_bo_find(vm, bo);
> -   }
> -
> if (gds) {
> p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -629,8 +634,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> }
>
>  error_validate:
> -   if (r)
> +   if (r) {
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   dma_fence_chain_free(e->chain);
> +   e->chain = NULL;
> +   }
> ttm_eu_backoff_reservation(>ticket, >validated);
> +   }
>  out:
> return r;
>  }
> @@ -670,9 +680,17 @@ static void amdgpu_cs_parser_fini(struct 
> amdgpu_cs_parser *parser, int error,
>  {
> unsigned i;
>
> -   if (error && backoff)
> +   if (error && backoff) {
> +   struct amdgpu_bo_list_entry *e;
> +
> +   amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> +   dma_fence_chain_free(e->chain);
> +   e->chain = NULL;
> +   }
> +
> ttm_eu_backoff_reservation(>ticket,
>>validated);
> +   }
>
> for (i = 0; i < parser->num_post_deps; i++) {
> drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1245,6 +1263,28 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>
> amdgpu_vm_move_to_lru_tail(p->adev, >vm);
>
> +   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> +   struct dma_resv *resv = e->tv.bo->base.resv;
> +   struct dma_fence_chain *chain = e->chain;
> +
> +   if (!chain)
> +   continue;
> +
> +   /*
> +* Work around dma_resv 

Re: [PATCH] drm/amdkfd: Set p2plink non-coherent in topology

2021-06-17 Thread Felix Kuehling
If this is for the DKMS branch, the review should go to our internal
review list.

Regards,
  Felix


Am 2021-06-17 um 10:20 a.m. schrieb Eric Huang:
> Fix non-coherent bit of p2plink properties flag
> which always is 0.
>
> Signed-off-by: Eric Huang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index d390aa369f7b..0705ff5eaa26 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1404,6 +1404,7 @@ static void kfd_fill_iolink_non_crat_info(struct 
> kfd_topology_device *dev)
>  
>   inbound_link->flags = CRAT_IOLINK_FLAGS_ENABLED;
>   kfd_set_iolink_no_atomics(peer_dev, dev, inbound_link);
> + kfd_set_iolink_non_coherent(peer_dev, link, 
> inbound_link);
>   }
>   }
>  }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-06-17 Thread Sierra Guiza, Alejandro (Alex)



On 6/17/2021 10:16 AM, Alex Sierra wrote:

v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory 
allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently 
upstreamed
to Dave Airlie's drm-next branch
https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

Corrected link:

https://cgit.freedesktop.org/drm/drm/log/?h=drm-next

Regards,
Alex Sierra


On top of that we did some rework of our VRAM management for migrations to 
remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://patchwork.kernel.org/project/dri-devel/list/?series=489811


Corrected link:

https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

Regards,
Alex Sierra



v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
Patches 4-5 are for context to show how we are looking up the SPM
memory and registering it with devmap.
Patches 3,6-8 are the changes we are trying to upstream or rework to
make them acceptable upstream.

Alex Sierra (6):
   kernel: resource: lookup_resource as exported symbol
   drm/amdkfd: add SPM support for SVM
   drm/amdkfd: generic type as sys mem on migration to ram
   include/linux/mm.h: helpers to check zone device generic type
   mm: add generic type support to migrate_vma helpers
   mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

Ralph Campbell (2):
   ext4/xfs: add page refcount helper
   mm: remove extra ZONE_DEVICE struct page refcount

  arch/powerpc/kvm/book3s_hv_uvmem.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 --
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
  fs/dax.c |  8 +--
  fs/ext4/inode.c  |  5 +-
  fs/xfs/xfs_file.c|  4 +-
  include/linux/dax.h  | 10 
  include/linux/memremap.h |  7 +--
  include/linux/mm.h   | 52 +++---
  kernel/resource.c|  2 +-
  lib/test_hmm.c   |  2 +-
  mm/internal.h|  8 +++
  mm/memremap.c| 69 +++-
  mm/migrate.c | 13 ++---
  mm/page_alloc.c  |  3 ++
  mm/swap.c| 45 ++--
  16 files changed, 83 insertions(+), 164 deletions(-)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter
On Thu, Jun 17, 2021 at 07:30:24PM +0200, Daniel Vetter wrote:
> On Thu, Jun 17, 2021 at 09:44:25AM +0200, Christian König wrote:
> > Alex do want to review those so that we can close the ticket?
> 
> Maybe I'm behind on mails, but 2nd patch still has the issues I think I'm
> seeing ...

Ok with temperatures getting colder towards the night the 2nd patch looks
much better now :-) I replied there.
-Daniel

> -Daniel
> 
> > 
> > Thanks,
> > Christian.
> > 
> > Am 14.06.21 um 19:45 schrieb Christian König:
> > > Unwrap the explicit fence if it is a dma_fence_chain and
> > > sync to the first fence not matching the owner rules.
> > > 
> > > Signed-off-by: Christian König 
> > > Acked-by: Daniel Vetter 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +--
> > >   1 file changed, 68 insertions(+), 50 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -28,6 +28,8 @@
> > >*Christian König 
> > >*/
> > > +#include 
> > > +
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_trace.h"
> > >   #include "amdgpu_amdkfd.h"
> > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, 
> > > struct dma_fence *fence)
> > >   return amdgpu_sync_fence(sync, fence);
> > >   }
> > > +/* Determine based on the owner and mode if we should sync to a fence or 
> > > not */
> > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > +enum amdgpu_sync_mode mode,
> > > +void *owner, struct dma_fence *f)
> > > +{
> > > + void *fence_owner = amdgpu_sync_get_owner(f);
> > > +
> > > + /* Always sync to moves, no matter what */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return true;
> > > +
> > > + /* We only want to trigger KFD eviction fences on
> > > +  * evict or move jobs. Skip KFD fences otherwise.
> > > +  */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return false;
> > > +
> > > + /* Never sync to VM updates either. */
> > > + if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > + return false;
> > > +
> > > + /* Ignore fences depending on the sync mode */
> > > + switch (mode) {
> > > + case AMDGPU_SYNC_ALWAYS:
> > > + return true;
> > > +
> > > + case AMDGPU_SYNC_NE_OWNER:
> > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > + fence_owner == owner)
> > > + return false;
> > > + break;
> > > +
> > > + case AMDGPU_SYNC_EQ_OWNER:
> > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > + fence_owner != owner)
> > > + return false;
> > > + break;
> > > +
> > > + case AMDGPU_SYNC_EXPLICIT:
> > > + return false;
> > > + }
> > > +
> > > + WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > +  "Adding eviction fence to sync obj");
> > > + return true;
> > > +}
> > > +
> > >   /**
> > >* amdgpu_sync_resv - sync to a reservation object
> > >*
> > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, 
> > > struct amdgpu_sync *sync,
> > >   /* always sync to the exclusive fence */
> > >   f = dma_resv_excl_fence(resv);
> > > - r = amdgpu_sync_fence(sync, f);
> > > + dma_fence_chain_for_each(f, f) {
> > > + struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > > +
> > > + if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > > +chain->fence : f)) {
> > > + r = amdgpu_sync_fence(sync, f);
> > > + dma_fence_put(f);
> > > + if (r)
> > > + return r;
> > > + break;
> > > + }
> > > + }
> > >   flist = dma_resv_shared_list(resv);
> > > - if (!flist || r)
> > > - return r;
> > > + if (!flist)
> > > + return 0;
> > >   for (i = 0; i < flist->shared_count; ++i) {
> > > - void *fence_owner;
> > > -
> > >   f = rcu_dereference_protected(flist->shared[i],
> > > dma_resv_held(resv));
> > > - fence_owner = amdgpu_sync_get_owner(f);
> > > -
> > > - /* Always sync to moves, no matter what */
> > > - if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > > + if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > >   r = amdgpu_sync_fence(sync, f);
> > >   if (r)
> > > - break;
> > > - }
> > > -
> > > - /* We only want to trigger KFD eviction fences on
> > > -  * evict or move jobs. 

Re: [PATCH 2/2] drm/amdgpu: rework dma_resv handling v3

2021-06-17 Thread Daniel Vetter
On Mon, Jun 14, 2021 at 07:45:36PM +0200, Christian König wrote:
> Drop the workaround and instead implement a better solution.
> 
> Basically we are now chaining all submissions using a dma_fence_chain
> container and adding them as exclusive fence to the dma_resv object.
> 
> This way other drivers can still sync to the single exclusive fence
> while amdgpu only sync to fences from different processes.
> 
> v3: add the shared fence first before the exclusive one
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 62 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
>  6 files changed, 55 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> index a130e766cbdb..c905a4cfc173 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
>  struct amdgpu_bo_list_entry {
>   struct ttm_validate_buffer  tv;
>   struct amdgpu_bo_va *bo_va;
> + struct dma_fence_chain  *chain;
>   uint32_tpriority;
>   struct page **user_pages;
>   booluser_invalidated;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 9ce649a1a8d3..25655414e9c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -572,6 +572,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   goto out;
>   }
>  
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> +
> + e->bo_va = amdgpu_vm_bo_find(vm, bo);
> +
> + if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> + e->chain = dma_fence_chain_alloc();
> + if (!e->chain) {
> + r = -ENOMEM;
> + goto error_validate;
> + }
> + }
> + }
> +
>   amdgpu_cs_get_threshold_for_moves(p->adev, >bytes_moved_threshold,
> >bytes_moved_vis_threshold);
>   p->bytes_moved = 0;
> @@ -599,15 +613,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   gws = p->bo_list->gws_obj;
>   oa = p->bo_list->oa_obj;
>  
> - amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> - struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> -
> - /* Make sure we use the exclusive slot for shared BOs */
> - if (bo->prime_shared_count)
> - e->tv.num_shared = 0;
> - e->bo_va = amdgpu_vm_bo_find(vm, bo);
> - }
> -
>   if (gds) {
>   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
>   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> @@ -629,8 +634,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   }
>  
>  error_validate:
> - if (r)
> + if (r) {
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + dma_fence_chain_free(e->chain);
> + e->chain = NULL;
> + }
>   ttm_eu_backoff_reservation(>ticket, >validated);
> + }
>  out:
>   return r;
>  }
> @@ -670,9 +680,17 @@ static void amdgpu_cs_parser_fini(struct 
> amdgpu_cs_parser *parser, int error,
>  {
>   unsigned i;
>  
> - if (error && backoff)
> + if (error && backoff) {
> + struct amdgpu_bo_list_entry *e;
> +
> + amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> + dma_fence_chain_free(e->chain);
> + e->chain = NULL;
> + }
> +
>   ttm_eu_backoff_reservation(>ticket,
>  >validated);
> + }
>  
>   for (i = 0; i < parser->num_post_deps; i++) {
>   drm_syncobj_put(parser->post_deps[i].syncobj);
> @@ -1245,6 +1263,28 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>  
>   amdgpu_vm_move_to_lru_tail(p->adev, >vm);
>  
> + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> + struct dma_resv *resv = e->tv.bo->base.resv;
> + struct dma_fence_chain *chain = e->chain;
> +
> + if (!chain)
> + continue;
> +
> + /*
> +  * Work around dma_resv shortcommings by wrapping up the
> +  * submission in a dma_fence_chain and add it as exclusive
> +  * fence, but first add the 

[PATCH umr 1/3] Improve handling of non-standard page tables in AI+

2021-06-17 Thread Joseph Greathouse
Fixes handling of GPUVM page table decoding when not using 4-level
page tables with 512 entries per level. This includes:

- Calculating actual size of top-most PDB based on total VM range,
  page table depth, and page table block size.
- Calculating size of PTB based on the page table block size
  and the PDE0's block fragment size.
- Handling PTE offset and masks from from PDE0 with P-bit, normal
  PTBs, or PTBs from a translate-further layer.
- When using a PTE with F bit to go one layer deeper, pull new
  block fragment size out of that PTE to handle further-level PTBs
  of non-standard sizes.

Signed-off-by: Joseph Greathouse 
---
 src/lib/read_vram.c | 199 ++--
 1 file changed, 153 insertions(+), 46 deletions(-)

diff --git a/src/lib/read_vram.c b/src/lib/read_vram.c
index efcd081..049acd4 100644
--- a/src/lib/read_vram.c
+++ b/src/lib/read_vram.c
@@ -297,6 +297,26 @@ invalid_page:
return -1;
 }
 
+/** round_up_pot -- Round up value to next power of two */
+static uint64_t round_up_pot(uint64_t x)
+{
+   uint64_t y = (64ULL * 1024 * 1024); // start at 64MiB
+   while (y < x)
+   y <<= 1;
+   return y;
+}
+
+static uint64_t log2_vm_size(uint64_t page_table_start_addr, uint64_t 
page_table_end_addr)
+{
+   uint64_t size_of_vm_bytes = page_table_end_addr - page_table_start_addr 
+ 4096;
+   size_of_vm_bytes = round_up_pot(size_of_vm_bytes);
+   // Find the highest bit set to get an estimate for log2(size)
+   uint32_t vm_bits = 0;
+   while (size_of_vm_bytes >>= 1)
+   vm_bits++;
+   return vm_bits;
+}
+
 /**
  * umr_access_vram_ai - Access GPU mapped memory for GFX9+ platforms
  */
@@ -304,17 +324,19 @@ static int umr_access_vram_ai(struct umr_asic *asic, 
uint32_t vmid,
  uint64_t address, uint32_t size,
  void *dst, int write_en)
 {
-   uint64_t start_addr, page_table_start_addr, page_table_base_addr,
-page_table_block_size, pte_idx, pde_idx, pte_entry, pde_entry,
+   uint64_t start_addr, page_table_start_addr, page_table_end_addr, 
page_table_base_addr,
+page_table_block_size, log2_ptb_entries, pte_idx, pde_idx, 
pte_entry, pde_entry,
 pde_address, vm_fb_offset,
 va_mask, offset_mask, system_aperture_low, 
system_aperture_high,
-fb_top, fb_bottom, pte_page_mask, agp_base, agp_bot, agp_top, 
prev_addr;
+fb_top, fb_bottom, ptb_mask, pte_page_mask, agp_base, agp_bot, 
agp_top, prev_addr;
uint32_t chunk_size, tmp, pde0_block_fragment_size;
int pde_cnt, current_depth, page_table_depth, zfb, further;
struct {
uint32_t
mmVM_CONTEXTx_PAGE_TABLE_START_ADDR_LO32,
mmVM_CONTEXTx_PAGE_TABLE_START_ADDR_HI32,
+   mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_LO32,
+   mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_HI32,
mmVM_CONTEXTx_CNTL,
mmVM_CONTEXTx_PAGE_TABLE_BASE_ADDR_LO32,
mmVM_CONTEXTx_PAGE_TABLE_BASE_ADDR_HI32,
@@ -461,6 +483,12 @@ static int umr_access_vram_ai(struct umr_asic *asic, 
uint32_t vmid,
sprintf(buf, "mm%sVM_CONTEXT%" PRIu32 "_PAGE_TABLE_START_ADDR_HI32", 
regprefix, vmid);
registers.mmVM_CONTEXTx_PAGE_TABLE_START_ADDR_HI32 = 
umr_read_reg_by_name_by_ip(asic, hub, buf);
page_table_start_addr |= 
(uint64_t)registers.mmVM_CONTEXTx_PAGE_TABLE_START_ADDR_HI32 << 44;
+   sprintf(buf, "mm%sVM_CONTEXT%" PRIu32 "_PAGE_TABLE_END_ADDR_LO32", 
regprefix, vmid);
+   registers.mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_LO32 = 
umr_read_reg_by_name_by_ip(asic, hub, buf);
+   page_table_end_addr = 
(uint64_t)registers.mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_LO32 << 12;
+   sprintf(buf, "mm%sVM_CONTEXT%" PRIu32 "_PAGE_TABLE_END_ADDR_HI32", 
regprefix, vmid);
+   registers.mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_HI32 = 
umr_read_reg_by_name_by_ip(asic, hub, buf);
+   page_table_end_addr |= 
(uint64_t)registers.mmVM_CONTEXTx_PAGE_TABLE_END_ADDR_HI32 << 44;
 
sprintf(buf, "mm%sVM_CONTEXT%" PRIu32 "_CNTL", regprefix, vmid);
tmp = registers.mmVM_CONTEXTx_CNTL = 
umr_read_reg_by_name_by_ip(asic, hub, buf);
@@ -495,6 +523,8 @@ static int umr_access_vram_ai(struct umr_asic *asic, 
uint32_t vmid,
asic->mem_funcs.vm_message(
"mm%sVM_CONTEXT%" PRIu32 
"_PAGE_TABLE_START_ADDR_LO32=0x%" PRIx32 "\n"
"mm%sVM_CONTEXT%" PRIu32 
"_PAGE_TABLE_START_ADDR_HI32=0x%" PRIx32 "\n"
+   "mm%sVM_CONTEXT%" PRIu32 
"_PAGE_TABLE_END_ADDR_LO32=0x%" PRIx32 "\n"
+   "mm%sVM_CONTEXT%" PRIu32 
"_PAGE_TABLE_END_ADDR_HI32=0x%" PRIx32 "\n"

Re: [PATCH][next] drm/amd/display: Fix fall-through warning for Clang

2021-06-17 Thread Harry Wentland



On 2021-06-16 4:52 p.m., Gustavo A. R. Silva wrote:
> In preparation to enable -Wimplicit-fallthrough for Clang, fix
> the following warning by replacing a /* fall through */ comment
> with the new pseudo-keyword macro fallthrough:
> 
> rivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:672:4: warning: 
> unannotated fall-through between switch labels [-Wimplicit-fallthrough]
> case AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER:
> ^
> 
> Notice that Clang doesn't recognize /* fall through */ comments as
> implicit fall-through markings, so in order to globally enable
> -Wimplicit-fallthrough for Clang, these comments need to be
> replaced with fallthrough; in the whole codebase.
> 
> Link: https://github.com/KSPP/linux/issues/115
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Harry Wentland 

Harry

> ---
> JFYI: We had thousands of these sorts of warnings and now we are down
>   to just 15 in linux-next. This is one of those last remaining
>   warnings.
> 
>  drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c 
> b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> index 28631714f697..2fb88e54a4bf 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
> @@ -668,7 +668,7 @@ bool dce_aux_transfer_with_retries(struct ddc_service 
> *ddc,
>   /* polling_timeout_period is in us */
>   defer_time_in_ms += 
> aux110->polling_timeout_period / 1000;
>   ++aux_defer_retries;
> - /* fall through */
> + fallthrough;
>   case AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER:
>   retry_on_defer = true;
>   fallthrough;
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 5/8] drm/amdkfd: generic type as sys mem on migration to ram

2021-06-17 Thread Alex Sierra
Generic device type memory on VRAM to RAM migration,
has similar access as System RAM from the CPU. This flag sets
the source from the sender. Which in Generic type case,
should be set as SYSTEM.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f5939449a99f..7b41006c1164 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -653,8 +653,9 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate.vma = vma;
migrate.start = start;
migrate.end = end;
-   migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
+   migrate.flags = adev->gmc.xgmi.connected_to_cpu ?
+   MIGRATE_VMA_SELECT_SYSTEM : 
MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 
size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
size *= npages;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 1/8] ext4/xfs: add page refcount helper

2021-06-17 Thread Alex Sierra
From: Ralph Campbell 

There are several places where ZONE_DEVICE struct pages assume a reference
count == 1 means the page is idle and free. Instead of open coding this,
add a helper function to hide this detail.

v2:
[AS]: rename dax_layout_is_idle_page func to dax_page_unused

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
 fs/dax.c|  4 ++--
 fs/ext4/inode.c |  5 +
 fs/xfs/xfs_file.c   |  4 +---
 include/linux/dax.h | 10 ++
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 26d5dcd2d69e..321f4ddc6643 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
address_space *mapping,
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
+   WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
page->mapping = NULL;
page->index = 0;
@@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
 
-   if (page_ref_count(page) > 1)
+   if (!dax_layout_is_idle_page(page))
return page;
}
return NULL;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c173c8405856..9ee00186412f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3972,10 +3972,7 @@ int ext4_break_layouts(struct inode *inode)
if (!page)
return 0;
 
-   error = ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1,
-   TASK_INTERRUPTIBLE, 0, 0,
-   ext4_wait_dax_page(ei));
+   error = dax_wait_page(ei, page, ext4_wait_dax_page);
} while (error == 0);
 
return error;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..39565fe5f817 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -782,9 +782,7 @@ xfs_break_dax_layouts(
return 0;
 
*retry = true;
-   return ___wait_var_event(>_refcount,
-   atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
-   0, 0, xfs_wait_dax_page(inode));
+   return dax_wait_page(inode, page, xfs_wait_dax_page);
 }
 
 int
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b52f084aa643..8b5da1d60dbc 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -243,6 +243,16 @@ static inline bool dax_mapping(struct address_space 
*mapping)
return mapping->host && IS_DAX(mapping->host);
 }
 
+static inline bool dax_page_unused(struct page *page)
+{
+   return page_ref_count(page) == 1;
+}
+
+#define dax_wait_page(_inode, _page, _wait_cb) \
+   ___wait_var_event(&(_page)->_refcount,  \
+   dax_page_unused(_page), \
+   TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
+
 #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
 void hmem_register_device(int target_nid, struct resource *r);
 #else
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 6/8] include/linux/mm.h: helpers to check zone device generic type

2021-06-17 Thread Alex Sierra
Two helpers added. One checks if zone device page is generic
type. The other if page is either private or generic type.

Signed-off-by: Alex Sierra 
---
 include/linux/mm.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8d79bb94be8..f5b247a63044 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1125,6 +1125,14 @@ static inline bool is_device_private_page(const struct 
page *page)
page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_page(const struct page *page)
+{
+   return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+   is_zone_device_page(page) &&
+   (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
+page->pgmap->type == MEMORY_DEVICE_GENERIC);
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*

2021-06-17 Thread Alex Sierra
v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory 
allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently 
upstreamed
to Dave Airlie's drm-next branch
https://lore.kernel.org/dri-devel/20210527205606.2660-6-felix.kuehl...@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc
On top of that we did some rework of our VRAM management for migrations to 
remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://patchwork.kernel.org/project/dri-devel/list/?series=489811

v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches
Patches 4-5 are for context to show how we are looking up the SPM 
memory and registering it with devmap.
Patches 3,6-8 are the changes we are trying to upstream or rework to 
make them acceptable upstream.

Alex Sierra (6):
  kernel: resource: lookup_resource as exported symbol
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: generic type as sys mem on migration to ram
  include/linux/mm.h: helpers to check zone device generic type
  mm: add generic type support to migrate_vma helpers
  mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

Ralph Campbell (2):
  ext4/xfs: add page refcount helper
  mm: remove extra ZONE_DEVICE struct page refcount

 arch/powerpc/kvm/book3s_hv_uvmem.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 15 --
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
 fs/dax.c |  8 +--
 fs/ext4/inode.c  |  5 +-
 fs/xfs/xfs_file.c|  4 +-
 include/linux/dax.h  | 10 
 include/linux/memremap.h |  7 +--
 include/linux/mm.h   | 52 +++---
 kernel/resource.c|  2 +-
 lib/test_hmm.c   |  2 +-
 mm/internal.h|  8 +++
 mm/memremap.c| 69 +++-
 mm/migrate.c | 13 ++---
 mm/page_alloc.c  |  3 ++
 mm/swap.c| 45 ++--
 16 files changed, 83 insertions(+), 164 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 3/8] kernel: resource: lookup_resource as exported symbol

2021-06-17 Thread Alex Sierra
The AMD architecture for the Frontier supercomputer will
have device memory which can be coherently accessed by
the CPU. The system BIOS advertises this memory as SPM
(special purpose memory) in the UEFI system address map.

The AMDGPU driver needs to be able to lookup this resource
in order to claim it as MEMORY_DEVICE_GENERIC using
devm_memremap_pages.

Signed-off-by: Alex Sierra 
---
 kernel/resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..269489bb7097 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -783,7 +783,7 @@ struct resource *lookup_resource(struct resource *root, 
resource_size_t start)
 
return res;
 }
-
+EXPORT_SYMBOL_GPL(lookup_resource);
 /*
  * Insert a resource into the resource tree. If successful, return NULL,
  * otherwise return the conflicting resource (compare to __request_resource())
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-17 Thread Alex Sierra
From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
 fs/dax.c   |  4 +-
 include/linux/dax.h|  2 +-
 include/linux/memremap.h   |  7 +--
 include/linux/mm.h | 44 -
 lib/test_hmm.c |  2 +-
 mm/internal.h  |  8 +++
 mm/memremap.c  | 68 +++---
 mm/migrate.c   |  5 --
 mm/page_alloc.c|  3 ++
 mm/swap.c  | 45 ++---
 12 files changed, 45 insertions(+), 147 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..acee67710620 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -711,7 +711,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 
dpage = pfn_to_page(uvmem_pfn);
dpage->zone_device_data = pvt;
-   get_page(dpage);
+   init_page_count(dpage);
lock_page(dpage);
return dpage;
 out_clear:
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 92987daa5e17..8bc7120e1216 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
 
-   get_page(page);
+   init_page_count(page);
lock_page(page);
return page;
 }
diff --git a/fs/dax.c b/fs/dax.c
index 321f4ddc6643..7b4c6b35b098 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -560,14 +560,14 @@ static void *grab_mapping_entry(struct xa_state *xas,
 
 /**
  * dax_layout_busy_page_range - find first pinned page in @mapping
- * @mapping: address space to scan for a page with ref count > 1
+ * @mapping: address space to scan for a page with ref count > 0
  * @start: Starting offset. Page containing 'start' is included.
  * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX,
  *   pages from 'start' till the end of file are included.
  *
  * DAX requires ZONE_DEVICE mapped pages. These pages are never
  * 'onlined' to the page allocator so they are considered idle when
- * page->count == 1. A filesystem uses this interface to determine if
+ * page->count == 0. A filesystem uses this interface to determine if
  * any page in the mapping is busy, i.e. for DMA, or other
  * get_user_pages() usages.
  *
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8b5da1d60dbc..05fc982ce153 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -245,7 +245,7 @@ static inline bool dax_mapping(struct address_space 
*mapping)
 
 static inline bool dax_page_unused(struct page *page)
 {
-   return page_ref_count(page) == 1;
+   return page_ref_count(page) == 0;
 }
 
 #define dax_wait_page(_inode, _page, _wait_cb) \
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 79c49e7f5c30..327f32427d21 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -66,9 +66,10 @@ enum memory_type {
 
 struct dev_pagemap_ops {
/*
-* Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
-* reach 0 refcount unless there is a refcount bug. This allows the
-* device driver to implement its own memory management.)
+* Called once the page refcount reaches 0. The reference count
+* should be reset to one with init_page_count(page) before reusing
+* the page. This allows the device driver to implement its own
+* memory management.
 */
void (*page_free)(struct page *page);
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9900aedc195..d8d79bb94be8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1117,39 +1117,6 @@ static inline bool is_zone_device_page(const struct page 
*page)
 }
 #endif
 
-#ifdef CONFIG_DEV_PAGEMAP_OPS
-void free_devmap_managed_page(struct page *page);
-DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
-
-static inline bool page_is_devmap_managed(struct page *page)
-{
-   if (!static_branch_unlikely(_managed_key))
-   return false;
-   if (!is_zone_device_page(page))
-   return false;
-   switch (page->pgmap->type) {
-   case MEMORY_DEVICE_PRIVATE:
-   case 

[PATCH v3 8/8] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

2021-06-17 Thread Alex Sierra
Add MEMORY_DEVICE_GENERIC case to free_zone_device_page
callback.
Device generic type memory case is now able to free its
pages properly.

Signed-off-by: Alex Sierra 
---
 mm/memremap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 614b3d600e95..6c884e2542a9 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -438,7 +438,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-static void free_device_private_page(struct page *page)
+static void free_device_page(struct page *page)
 {
 
__ClearPageWaiters(page);
@@ -477,7 +477,8 @@ void free_zone_device_page(struct page *page)
wake_up_var(>_refcount);
return;
case MEMORY_DEVICE_PRIVATE:
-   free_device_private_page(page);
+   case MEMORY_DEVICE_GENERIC:
+   free_device_page(page);
return;
default:
return;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 4/8] drm/amdkfd: add SPM support for SVM

2021-06-17 Thread Alex Sierra
When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_GENERIC to create the device
page map region.

Signed-off-by: Alex Sierra 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index c8ca3252cbc2..f5939449a99f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -895,6 +895,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
struct resource *res;
unsigned long size;
void *r;
+   bool xgmi_connected_to_cpu = adev->gmc.xgmi.connected_to_cpu;
 
/* Page migration works on Vega10 or newer */
if (kfddev->device_info->asic_family < CHIP_VEGA10)
@@ -907,17 +908,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
 * should remove reserved size
 */
size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-   res = devm_request_free_mem_region(adev->dev, _resource, size);
+   if (xgmi_connected_to_cpu)
+   res = lookup_resource(_resource, adev->gmc.aper_base);
+   else
+   res = devm_request_free_mem_region(adev->dev, _resource, 
size);
+
if (IS_ERR(res))
return -ENOMEM;
 
-   pgmap->type = MEMORY_DEVICE_PRIVATE;
pgmap->nr_range = 1;
pgmap->range.start = res->start;
pgmap->range.end = res->end;
+   pgmap->type = xgmi_connected_to_cpu ?
+   MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;
pgmap->ops = _migrate_pgmap_ops;
pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-   pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   pgmap->flags = 0;
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
pr_err("failed to register HMM device memory\n");
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 7/8] mm: add generic type support to migrate_vma helpers

2021-06-17 Thread Alex Sierra
Device generic type case added for migrate_vma_pages and
migrate_vma_check_page helpers.
Both, generic and private device types have the same
conditions to decide to migrate pages from/to device
memory.

Signed-off-by: Alex Sierra 
---
 mm/migrate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 8c2430d3e77b..3b6aaba96fe6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2602,7 +2602,7 @@ static bool migrate_vma_check_page(struct page *page)
 * FIXME proper solution is to rework migration_entry_wait() so
 * it does not need to take a reference on page.
 */
-   return is_device_private_page(page);
+   return is_device_page(page);
}
 
/* For file back page */
@@ -3064,10 +3064,10 @@ void migrate_vma_pages(struct migrate_vma *migrate)
mapping = page_mapping(page);
 
if (is_zone_device_page(newpage)) {
-   if (is_device_private_page(newpage)) {
+   if (is_device_page(newpage)) {
/*
-* For now only support private anonymous when
-* migrating to un-addressable device memory.
+* For now only support private and generic
+* anonymous when migrating to device memory.
 */
if (mapping) {
migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount

2021-06-17 Thread Ralph Campbell



On 6/17/21 8:16 AM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 44 -
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 68 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 45 insertions(+), 147 deletions(-)


I think it is great that you are picking this up and trying to revive it.

However, I have a number of concerns about how it affects existing ZONE_DEVICE
MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
addressing them. For example, dev_dax_probe() allocates MEMORY_DEVICE_GENERIC
struct pages and then:
  dev_dax_fault()
dev_dax_huge_fault()
  __dev_dax_pte_fault()
vmf_insert_mixed()
which just inserts the PFN into the CPU page tables without increasing the page
refcount so it is zero (whereas it was one before). But using get_page() will
trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current notion of
free verses allocated for these struct pages. I suppose init_page_count()
could be called on all the struct pages in dev_dax_probe() to fix that though.

I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File systems have 
clear
allocate and free states for backing storage but there are the complications 
with
the page cache references, etc. to consider. The >1 to 1 reference count seems 
to
be used to tell when a page is idle (no I/O, reclaim scanners) rather than free
(not allocated to any file) but I'm not 100% sure about that since I don't 
really
understand all the issues around why a file system needs to have a DAX mount 
option
besides knowing that the storage block size has to be a multiple of the page 
size.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: Set p2plink non-coherent in topology

2021-06-17 Thread Eric Huang
Fix non-coherent bit of p2plink properties flag
which always is 0.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index d390aa369f7b..0705ff5eaa26 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1404,6 +1404,7 @@ static void kfd_fill_iolink_non_crat_info(struct 
kfd_topology_device *dev)
 
inbound_link->flags = CRAT_IOLINK_FLAGS_ENABLED;
kfd_set_iolink_no_atomics(peer_dev, dev, inbound_link);
+   kfd_set_iolink_non_coherent(peer_dev, link, 
inbound_link);
}
}
 }
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: remove unused parameter in amdgpu_gart_bind

2021-06-17 Thread Yifan Zhang
After commit 72a616bb953329bd97c6d6d4c64f3f40ed788a36,
pagelist is no long used in amdgpu_gart_bind. Remove it.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 7 +++
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 1091ec5d3592..9fbd1e62948b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -300,7 +300,6 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
  * @adev: amdgpu_device pointer
  * @offset: offset into the GPU's gart aperture
  * @pages: number of pages to bind
- * @pagelist: pages to bind
  * @dma_addr: DMA addresses of pages
  * @flags: page table entry flags
  *
@@ -309,7 +308,7 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
  * Returns 0 for success, -EINVAL for failure.
  */
 int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
-int pages, struct page **pagelist, dma_addr_t *dma_addr,
+int pages, dma_addr_t *dma_addr,
 uint64_t flags)
 {
if (!adev->gart.ready) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index e104022197ae..6ff87de620db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -64,7 +64,6 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
int pages, dma_addr_t *dma_addr, uint64_t flags,
void *dst);
 int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
-int pages, struct page **pagelist,
-dma_addr_t *dma_addr, uint64_t flags);
+int pages, dma_addr_t *dma_addr, uint64_t flags);
 void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e8033b6f2395..6297363ab740 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -857,7 +857,7 @@ static int amdgpu_ttm_gart_bind(struct amdgpu_device *adev,
uint64_t page_idx = 1;
 
r = amdgpu_gart_bind(adev, gtt->offset, page_idx,
-   ttm->pages, gtt->ttm.dma_address, flags);
+   gtt->ttm.dma_address, flags);
if (r)
goto gart_bind_fail;
 
@@ -871,11 +871,10 @@ static int amdgpu_ttm_gart_bind(struct amdgpu_device 
*adev,
r = amdgpu_gart_bind(adev,
gtt->offset + (page_idx << PAGE_SHIFT),
ttm->num_pages - page_idx,
-   >pages[page_idx],
&(gtt->ttm.dma_address[page_idx]), flags);
} else {
r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
-ttm->pages, gtt->ttm.dma_address, flags);
+gtt->ttm.dma_address, flags);
}
 
 gart_bind_fail:
@@ -951,7 +950,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
/* bind pages into GART page tables */
gtt->offset = (u64)bo_mem->start << PAGE_SHIFT;
r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
-   ttm->pages, gtt->ttm.dma_address, flags);
+   gtt->ttm.dma_address, flags);
 
if (r)
DRM_ERROR("failed to bind %u pages at 0x%08llX\n",
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter
On Thu, Jun 17, 2021 at 09:44:25AM +0200, Christian König wrote:
> Alex do want to review those so that we can close the ticket?

Maybe I'm behind on mails, but 2nd patch still has the issues I think I'm
seeing ...
-Daniel

> 
> Thanks,
> Christian.
> 
> Am 14.06.21 um 19:45 schrieb Christian König:
> > Unwrap the explicit fence if it is a dma_fence_chain and
> > sync to the first fence not matching the owner rules.
> > 
> > Signed-off-by: Christian König 
> > Acked-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +--
> >   1 file changed, 68 insertions(+), 50 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > @@ -28,6 +28,8 @@
> >*Christian König 
> >*/
> > +#include 
> > +
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   #include "amdgpu_amdkfd.h"
> > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, 
> > struct dma_fence *fence)
> > return amdgpu_sync_fence(sync, fence);
> >   }
> > +/* Determine based on the owner and mode if we should sync to a fence or 
> > not */
> > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > +  enum amdgpu_sync_mode mode,
> > +  void *owner, struct dma_fence *f)
> > +{
> > +   void *fence_owner = amdgpu_sync_get_owner(f);
> > +
> > +   /* Always sync to moves, no matter what */
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return true;
> > +
> > +   /* We only want to trigger KFD eviction fences on
> > +* evict or move jobs. Skip KFD fences otherwise.
> > +*/
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > +   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return false;
> > +
> > +   /* Never sync to VM updates either. */
> > +   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > +   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > +   return false;
> > +
> > +   /* Ignore fences depending on the sync mode */
> > +   switch (mode) {
> > +   case AMDGPU_SYNC_ALWAYS:
> > +   return true;
> > +
> > +   case AMDGPU_SYNC_NE_OWNER:
> > +   if (amdgpu_sync_same_dev(adev, f) &&
> > +   fence_owner == owner)
> > +   return false;
> > +   break;
> > +
> > +   case AMDGPU_SYNC_EQ_OWNER:
> > +   if (amdgpu_sync_same_dev(adev, f) &&
> > +   fence_owner != owner)
> > +   return false;
> > +   break;
> > +
> > +   case AMDGPU_SYNC_EXPLICIT:
> > +   return false;
> > +   }
> > +
> > +   WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > +"Adding eviction fence to sync obj");
> > +   return true;
> > +}
> > +
> >   /**
> >* amdgpu_sync_resv - sync to a reservation object
> >*
> > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, 
> > struct amdgpu_sync *sync,
> > /* always sync to the exclusive fence */
> > f = dma_resv_excl_fence(resv);
> > -   r = amdgpu_sync_fence(sync, f);
> > +   dma_fence_chain_for_each(f, f) {
> > +   struct dma_fence_chain *chain = to_dma_fence_chain(f);
> > +
> > +   if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
> > +  chain->fence : f)) {
> > +   r = amdgpu_sync_fence(sync, f);
> > +   dma_fence_put(f);
> > +   if (r)
> > +   return r;
> > +   break;
> > +   }
> > +   }
> > flist = dma_resv_shared_list(resv);
> > -   if (!flist || r)
> > -   return r;
> > +   if (!flist)
> > +   return 0;
> > for (i = 0; i < flist->shared_count; ++i) {
> > -   void *fence_owner;
> > -
> > f = rcu_dereference_protected(flist->shared[i],
> >   dma_resv_held(resv));
> > -   fence_owner = amdgpu_sync_get_owner(f);
> > -
> > -   /* Always sync to moves, no matter what */
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
> > +   if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
> > r = amdgpu_sync_fence(sync, f);
> > if (r)
> > -   break;
> > -   }
> > -
> > -   /* We only want to trigger KFD eviction fences on
> > -* evict or move jobs. Skip KFD fences otherwise.
> > -*/
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > -   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > -   continue;
> > -
> > -   /* Never sync to VM updates either. */
> > -   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > -   owner != 

Re: [PATCH 1/5] dma-buf: fix dma_resv_test_signaled test_all handling

2021-06-17 Thread Daniel Vetter
On Mon, Jun 14, 2021 at 07:15:44PM +0200, Christian König wrote:
> Am 11.06.21 um 16:55 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 04:53:11PM +0200, Christian König wrote:
> > > 
> > > Am 11.06.21 um 16:47 schrieb Daniel Vetter:
> > > > On Fri, Jun 11, 2021 at 02:02:57PM +0200, Christian König wrote:
> > > > > As the name implies if testing all fences is requested we
> > > > > should indeed test all fences and not skip the exclusive
> > > > > one because we see shared ones.
> > > > > 
> > > > > Signed-off-by: Christian König 
> > > > Hm I thought we've had the rule that when both fences exist, then
> > > > collectively the shared ones must signale no earlier than the exclusive
> > > > one.
> > > > 
> > > > That's at least the contract we've implemented in dma_resv.h. But I've
> > > > also found a bunch of drivers who are a lot more yolo on this.
> > > > 
> > > > I think there's a solid case here to just always take all the fences if 
> > > > we
> > > > ask for all the shared ones, but if we go that way then I'd say
> > > > - clear kerneldoc patch to really hammer this in (currently we're not 
> > > > good
> > > > at all in this regard)
> > > > - going through drivers a bit to check for this (I have some of that 
> > > > done
> > > > already in my earlier series, need to respin it and send it out)
> > > > 
> > > > But I'm kinda not seeing why this needs to be in this patch series here.
> > > You mentioned that this is a problem in the last patch and if you ask me
> > > that's just a bug or at least very inconsistent.
> > > 
> > > See dma_resv_wait_timeout() always waits for all fences, including the
> > > exclusive one even if shared ones are present. But 
> > > dma_resv_test_signaled()
> > > ignores the exclusive one if shared ones are present.
> > Hm the only one I thought I've mentioned is that dma_buf_poll doesn't use
> > dma_fence_get_rcu_safe where I think it should. Different problem. I think
> > this is one you spotted.
> > 
> > > The only other driver I could find trying to make use of this is nouveau 
> > > and
> > > I already provided a fix for this as well.
> > i915 also does this, and I think I've found a few more.
> > 
> > > I just think that this is the more defensive approach to fix this and have
> > > at least the core functions consistent on the handling.
> > Oh fully agree, it's just current dma_resv docs aren't the greatest, and
> > hacking on semantics without updating the docs isn't great. Especially
> > when it's ad-hoc.
> 
> Well when the requirement that shared fences should always signal after the
> exclusive fence is not documented anywhere then I would say that it is
> naturally allowed to just add any fence to the list of shared fence and any
> code assuming something else is just broken and need fixing.

That's not what I meant. I thought the rule is that the shared fences
_together_ need to signal after the exclusive ones. Not each individual
one.

This means that if you have both exclusive  fences and shared fences, and
you want to wait for just the shared fences, then you can ignore the
exclusive ones.

You have a patch series floating around which "fixes" this, but I think
it's imcomplete. And I'm pretty sure it's a change of defacto rules, since
not obeying this breaks a bunch of existing code (as you've noticed).
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > ---
> > > > >drivers/dma-buf/dma-resv.c | 33 -
> > > > >1 file changed, 12 insertions(+), 21 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > > > > index f26c71747d43..c66bfdde9454 100644
> > > > > --- a/drivers/dma-buf/dma-resv.c
> > > > > +++ b/drivers/dma-buf/dma-resv.c
> > > > > @@ -615,25 +615,21 @@ static inline int 
> > > > > dma_resv_test_signaled_single(struct dma_fence *passed_fence)
> > > > > */
> > > > >bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all)
> > > > >{
> > > > > - unsigned int seq, shared_count;
> > > > > + struct dma_fence *fence;
> > > > > + unsigned int seq;
> > > > >   int ret;
> > > > >   rcu_read_lock();
> > > > >retry:
> > > > >   ret = true;
> > > > > - shared_count = 0;
> > > > >   seq = read_seqcount_begin(>seq);
> > > > >   if (test_all) {
> > > > >   struct dma_resv_list *fobj = dma_resv_shared_list(obj);
> > > > > - unsigned int i;
> > > > > -
> > > > > - if (fobj)
> > > > > - shared_count = fobj->shared_count;
> > > > > + unsigned int i, shared_count;
> > > > > + shared_count = fobj ? fobj->shared_count : 0;
> > > > >   for (i = 0; i < shared_count; ++i) {
> > > > > - struct dma_fence *fence;
> > > > > -
> > > > >   fence = rcu_dereference(fobj->shared[i]);
> > > > >   ret = 

Re: [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Daniel Vetter
On Mon, Jun 14, 2021 at 09:25:44AM +0200, Christian König wrote:
> Am 11.06.21 um 17:18 schrieb Daniel Vetter:
> > On Fri, Jun 11, 2021 at 12:09:19PM +0200, Christian König wrote:
> > > Am 11.06.21 um 11:07 schrieb Daniel Vetter:
> > > > On Thu, Jun 10, 2021 at 11:17:59AM +0200, Christian König wrote:
> > > > > Unwrap a the explicit fence if it is a dma_fence_chain and
> > > > > sync to the first fence not matching the owner rules.
> > > > > 
> > > > > Signed-off-by: Christian König 
> > > > > ---
> > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 
> > > > > +--
> > > > >1 file changed, 68 insertions(+), 50 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > index 1b2ceccaf5b0..862eb3c1c4c5 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > > > @@ -28,6 +28,8 @@
> > > > > *Christian König 
> > > > > */
> > > > > +#include 
> > > > > +
> > > > >#include "amdgpu.h"
> > > > >#include "amdgpu_trace.h"
> > > > >#include "amdgpu_amdkfd.h"
> > > > > @@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync 
> > > > > *sync, struct dma_fence *fence)
> > > > >   return amdgpu_sync_fence(sync, fence);
> > > > >}
> > > > > +/* Determine based on the owner and mode if we should sync to a 
> > > > > fence or not */
> > > > > +static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
> > > > > +enum amdgpu_sync_mode mode,
> > > > > +void *owner, struct dma_fence *f)
> > > > > +{
> > > > > + void *fence_owner = amdgpu_sync_get_owner(f);
> > > > > +
> > > > > + /* Always sync to moves, no matter what */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return true;
> > > > > +
> > > > > + /* We only want to trigger KFD eviction fences on
> > > > > +  * evict or move jobs. Skip KFD fences otherwise.
> > > > > +  */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> > > > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return false;
> > > > > +
> > > > > + /* Never sync to VM updates either. */
> > > > > + if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
> > > > > + owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> > > > > + return false;
> > > > > +
> > > > > + /* Ignore fences depending on the sync mode */
> > > > > + switch (mode) {
> > > > > + case AMDGPU_SYNC_ALWAYS:
> > > > > + return true;
> > > > > +
> > > > > + case AMDGPU_SYNC_NE_OWNER:
> > > > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > > > + fence_owner == owner)
> > > > > + return false;
> > > > > + break;
> > > > > +
> > > > > + case AMDGPU_SYNC_EQ_OWNER:
> > > > > + if (amdgpu_sync_same_dev(adev, f) &&
> > > > > + fence_owner != owner)
> > > > > + return false;
> > > > > + break;
> > > > > +
> > > > > + case AMDGPU_SYNC_EXPLICIT:
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
> > > > > +  "Adding eviction fence to sync obj");
> > > > > + return true;
> > > > > +}
> > > > > +
> > > > >/**
> > > > > * amdgpu_sync_resv - sync to a reservation object
> > > > > *
> > > > > @@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device 
> > > > > *adev, struct amdgpu_sync *sync,
> > > > >   /* always sync to the exclusive fence */
> > > > >   f = dma_resv_excl_fence(resv);
> > > > > - r = amdgpu_sync_fence(sync, f);
> > > > > + dma_fence_chain_for_each(f, f) {
> > > > Jason has some helper for deep-walking fence chains/arrays here I think.
> > > > Might want to look into that, so that we have some consistency in how we
> > > > pile up multiple exclusive fences.
> > > Well those helpers are not from Jason, but from me :)
> > > 
> > > But no, for now the deep inspection is not really helpful here since
> > > grabbing a reference to a certain chain node is what that makes the 
> > > handling
> > > easier and faster here.
> > > 
> > > Thinking more about it that should also make it possible for the garbage
> > > collection to kick in properly.
> > Hm this is tricky to reason about, but yeah with this here it's a true
> > chain, and you just need to connect them. But then if a buffer is on
> > multiple engines, collapsing things down occasionally might be useful.
> > 
> > But maybe we need to do that in the bigger rework where exclusive fences
> > are also just in the dma_fence_list with a "this is an exclusive one btw"
> > tag.
> > 
> > I think for the vk import case doing the deep scan makes more sense, it's
> > a once-per-frame thing, and there's a 

Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
Felix
What I am wondreing is that if CP got hang,  could we assume all usermode 
queues have stopped?
If so, we can do cleanupwork regardless of the retval of execute_queues_cpsch().

> 2021年6月17日 20:11,Pan, Xinhui  写道:
> 
> Felix
> what I am thinking of like below looks like more simple. :)
> 
> @@ -1501,6 +1501,11 @@ static int destroy_queue_cpsch(struct 
> device_queue_manager *dqm,
>/* remove queue from list to prevent rescheduling after preemption */
>dqm_lock(dqm);
> 
> +   if (dqm->is_hws_hang) {
> +   retval = -EIO;
> +   goto failed_try_destroy_debugged_queue;
> +   }
> +
>if (qpd->is_debug) {
>/*
> * error, currently we do not allow to destroy a queue
> 
>> 2021年6月17日 20:02,Pan, Xinhui  写道:
>> 
>> Handle queue destroy failure while CP hang.
>> Once CP got hang, kfd trigger GPU reset and set related flags to stop
>> driver touching the queue. As we leave the queue as it is, we need keep
>> the resource as it is too.
>> 
>> Regardless user-space tries to destroy the queue again or not. We need
>> put queue back to the list so process termination would do the cleanup
>> work. What's more, if userspace tries to destroy the queue again, we
>> would not free its resource twice.
>> 
>> Kfd return -EIO in this case, so lets handle it now.
>> 
>> Paste some error log below without this patch.
>> 
>> amdgpu: Can't create new usermode queue because -1 queues were already
>> created
>> 
>> refcount_t: underflow; use-after-free.
>> Call Trace:
>> kobject_put+0xe6/0x1b0
>> kfd_procfs_del_queue+0x37/0x50 [amdgpu]
>> pqm_destroy_queue+0x17a/0x390 [amdgpu]
>> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
>> kfd_ioctl+0x463/0x690 [amdgpu]
>> 
>> BUG kmalloc-32 (Tainted: GW): Object already free
>> INFO: Allocated in allocate_sdma_mqd+0x30/0xb0 [amdgpu] age=4796 cpu=2
>> pid=2511
>> __slab_alloc+0x72/0x80
>> kmem_cache_alloc_trace+0x81f/0x8c0
>> allocate_sdma_mqd+0x30/0xb0 [amdgpu]
>> create_queue_cpsch+0xbf/0x470 [amdgpu]
>> pqm_create_queue+0x28d/0x6d0 [amdgpu]
>> kfd_ioctl_create_queue+0x492/0xae0 [amdgpu]
>> INFO: Freed in free_mqd_hiq_sdma+0x20/0x60 [amdgpu] age=2537 cpu=7
>> pid=2511
>> kfree+0x322/0x340
>> free_mqd_hiq_sdma+0x20/0x60 [amdgpu]
>> destroy_queue_cpsch+0x20c/0x330 [amdgpu]
>> pqm_destroy_queue+0x1a3/0x390 [amdgpu]
>> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
>> 
>> Signed-off-by: xinhui pan 
>> ---
>> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   | 13 +
>> drivers/gpu/drm/amd/amdkfd/kfd_process.c|  4 +++-
>> .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  2 ++
>> 3 files changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index c069fa259b30..63a9a19a3987 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -1530,6 +1530,11 @@ static int destroy_queue_cpsch(struct 
>> device_queue_manager *dqm,
>>  KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
>>  if (retval == -ETIME)
>>  qpd->reset_wavefronts = true;
>> +/* In gpu reset? We leave the queue as it is, so do NOT
>> + * cleanup the resource.
>> + */
>> +else if (retval == -EIO)
>> +goto failed_execute_queue;
>>  if (q->properties.is_gws) {
>>  dqm->gws_queue_count--;
>>  qpd->mapped_gws_queue = false;
>> @@ -1551,6 +1556,14 @@ static int destroy_queue_cpsch(struct 
>> device_queue_manager *dqm,
>> 
>>  return retval;
>> 
>> +failed_execute_queue:
>> +/* Put queue back to the list, then we have chance to destroy it.
>> + * FIXME: we do NOT want the queue in the runlist again.
>> + */
>> +list_add(>list, >queues_list);
>> +qpd->queue_count++;
>> +if (q->properties.is_active)
>> +increment_queue_count(dqm, q->properties.type);
>> failed_try_destroy_debugged_queue:
>> 
>>  dqm_unlock(dqm);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index 09b98a83f670..984197e5929f 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -607,11 +607,13 @@ static int kfd_procfs_add_sysfs_files(struct 
>> kfd_process *p)
>> 
>> void kfd_procfs_del_queue(struct queue *q)
>> {
>> -if (!q)
>> +if (!q || !kobject_get_unless_zero(>kobj))
>>  return;
>> 
>>  kobject_del(>kobj);
>>  kobject_put(>kobj);
>> +/* paired with the get above */
>> +kobject_put(>kobj);
>> }
>> 
>> int kfd_process_create_wq(void)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 

Re: [PATCH v3 1/8] ext4/xfs: add page refcount helper

2021-06-17 Thread Darrick J. Wong
On Thu, Jun 17, 2021 at 10:16:58AM -0500, Alex Sierra wrote:
> From: Ralph Campbell 
> 
> There are several places where ZONE_DEVICE struct pages assume a reference
> count == 1 means the page is idle and free. Instead of open coding this,
> add a helper function to hide this detail.
> 
> v2:
> [AS]: rename dax_layout_is_idle_page func to dax_page_unused
> 
> Signed-off-by: Ralph Campbell 
> Signed-off-by: Alex Sierra 
> ---
>  fs/dax.c|  4 ++--
>  fs/ext4/inode.c |  5 +
>  fs/xfs/xfs_file.c   |  4 +---
>  include/linux/dax.h | 10 ++
>  4 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 26d5dcd2d69e..321f4ddc6643 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
> address_space *mapping,
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> + WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));
>   WARN_ON_ONCE(page->mapping && page->mapping != mapping);
>   page->mapping = NULL;
>   page->index = 0;
> @@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
>   for_each_mapped_pfn(entry, pfn) {
>   struct page *page = pfn_to_page(pfn);
>  
> - if (page_ref_count(page) > 1)
> + if (!dax_layout_is_idle_page(page))
>   return page;
>   }
>   return NULL;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c173c8405856..9ee00186412f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3972,10 +3972,7 @@ int ext4_break_layouts(struct inode *inode)
>   if (!page)
>   return 0;
>  
> - error = ___wait_var_event(>_refcount,
> - atomic_read(>_refcount) == 1,
> - TASK_INTERRUPTIBLE, 0, 0,
> - ext4_wait_dax_page(ei));
> + error = dax_wait_page(ei, page, ext4_wait_dax_page);
>   } while (error == 0);
>  
>   return error;
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5b0f93f73837..39565fe5f817 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -782,9 +782,7 @@ xfs_break_dax_layouts(
>   return 0;
>  
>   *retry = true;
> - return ___wait_var_event(>_refcount,
> - atomic_read(>_refcount) == 1, TASK_INTERRUPTIBLE,
> - 0, 0, xfs_wait_dax_page(inode));
> + return dax_wait_page(inode, page, xfs_wait_dax_page);

Mechanically, this looks like a straightforward replacement, so:
Acked-by: Darrick J. Wong 

--D

>  }
>  
>  int
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index b52f084aa643..8b5da1d60dbc 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -243,6 +243,16 @@ static inline bool dax_mapping(struct address_space 
> *mapping)
>   return mapping->host && IS_DAX(mapping->host);
>  }
>  
> +static inline bool dax_page_unused(struct page *page)
> +{
> + return page_ref_count(page) == 1;
> +}
> +
> +#define dax_wait_page(_inode, _page, _wait_cb)   
> \
> + ___wait_var_event(&(_page)->_refcount,  \
> + dax_page_unused(_page), \
> + TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
> +
>  #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
>  void hmem_register_device(int target_nid, struct resource *r);
>  #else
> -- 
> 2.17.1
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/2] drm/amdkfd: Walk through list with dqm lock hold

2021-06-17 Thread xinhui pan
To avoid any list corruption.

Signed-off-by: xinhui pan 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22 ++-
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 63a9a19a3987..d62374746c93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1722,7 +1722,7 @@ static int process_termination_cpsch(struct 
device_queue_manager *dqm,
struct qcm_process_device *qpd)
 {
int retval;
-   struct queue *q, *next;
+   struct queue *q;
struct kernel_queue *kq, *kq_next;
struct mqd_manager *mqd_mgr;
struct device_process_node *cur, *next_dpn;
@@ -1779,24 +1779,26 @@ static int process_termination_cpsch(struct 
device_queue_manager *dqm,
qpd->reset_wavefronts = false;
}
 
-   dqm_unlock(dqm);
-
-   /* Outside the DQM lock because under the DQM lock we can't do
-* reclaim or take other locks that others hold while reclaiming.
-*/
-   if (found)
-   kfd_dec_compute_active(dqm->dev);
-
/* Lastly, free mqd resources.
 * Do free_mqd() after dqm_unlock to avoid circular locking.
 */
-   list_for_each_entry_safe(q, next, >queues_list, list) {
+   while (!list_empty(>queues_list)) {
+   q = list_first_entry(>queues_list, struct queue, list);
mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
q->properties.type)];
list_del(>list);
qpd->queue_count--;
+   dqm_unlock(dqm);
mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
+   dqm_lock(dqm);
}
+   dqm_unlock(dqm);
+
+   /* Outside the DQM lock because under the DQM lock we can't do
+* reclaim or take other locks that others hold while reclaiming.
+*/
+   if (found)
+   kfd_dec_compute_active(dqm->dev);
 
return retval;
 }
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove unused parameter in amdgpu_gart_bind

2021-06-17 Thread Christian König

Am 17.06.21 um 16:09 schrieb Yifan Zhang:

After commit 72a616bb953329bd97c6d6d4c64f3f40ed788a36,
pagelist is no long used in amdgpu_gart_bind. Remove it.

Signed-off-by: Yifan Zhang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 7 +++
  3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 1091ec5d3592..9fbd1e62948b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -300,7 +300,6 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
   * @adev: amdgpu_device pointer
   * @offset: offset into the GPU's gart aperture
   * @pages: number of pages to bind
- * @pagelist: pages to bind
   * @dma_addr: DMA addresses of pages
   * @flags: page table entry flags
   *
@@ -309,7 +308,7 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
   * Returns 0 for success, -EINVAL for failure.
   */
  int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
-int pages, struct page **pagelist, dma_addr_t *dma_addr,
+int pages, dma_addr_t *dma_addr,
 uint64_t flags)
  {
if (!adev->gart.ready) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index e104022197ae..6ff87de620db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -64,7 +64,6 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,
int pages, dma_addr_t *dma_addr, uint64_t flags,
void *dst);
  int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t offset,
-int pages, struct page **pagelist,
-dma_addr_t *dma_addr, uint64_t flags);
+int pages, dma_addr_t *dma_addr, uint64_t flags);
  void amdgpu_gart_invalidate_tlb(struct amdgpu_device *adev);
  #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e8033b6f2395..6297363ab740 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -857,7 +857,7 @@ static int amdgpu_ttm_gart_bind(struct amdgpu_device *adev,
uint64_t page_idx = 1;
  
  		r = amdgpu_gart_bind(adev, gtt->offset, page_idx,

-   ttm->pages, gtt->ttm.dma_address, flags);
+   gtt->ttm.dma_address, flags);
if (r)
goto gart_bind_fail;
  
@@ -871,11 +871,10 @@ static int amdgpu_ttm_gart_bind(struct amdgpu_device *adev,

r = amdgpu_gart_bind(adev,
gtt->offset + (page_idx << PAGE_SHIFT),
ttm->num_pages - page_idx,
-   >pages[page_idx],
&(gtt->ttm.dma_address[page_idx]), flags);
} else {
r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
-ttm->pages, gtt->ttm.dma_address, flags);
+gtt->ttm.dma_address, flags);
}
  
  gart_bind_fail:

@@ -951,7 +950,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
/* bind pages into GART page tables */
gtt->offset = (u64)bo_mem->start << PAGE_SHIFT;
r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
-   ttm->pages, gtt->ttm.dma_address, flags);
+   gtt->ttm.dma_address, flags);
  
  	if (r)

DRM_ERROR("failed to bind %u pages at 0x%08llX\n",


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v1] drm/amdgpu: fix framebuffer memory use after free

2021-06-17 Thread Michel Dänzer
On 2021-06-17 10:18 a.m., Lukasz Bartosik wrote:
> With option CONFIG_DEBUG_LIST enabled the kernel logs show list_add
> corruption warning. The warning originates from drm_framebuffer_init()
> function which adds framebuffer to a framebuffers list and is called by
> amdgpu_display_gem_fb_verify_and_init().
> If amdgpu_display_gem_fb_verify_and_init() encounters an error after
> calling drm_framebuffer_init() then framebuffer memory is released
> in amdgpu_display_user_framebuffer_create() without removing framebuffer
> from the list where it was added. Reuse of that memory by any other
> party cause corruption of the framebuffers linked list. This fix removes
> framebuffer from the linked list and unregisters it in case of failure.
> 
> [...]
> 
> Fixes: 6eed95b00b45 ("drm/amd/display: Store tiling_flags in the 
> framebuffer.")

I didn't realize there was already an issue before f258907fdd835e "drm/amdgpu: 
Verify bo size can fit framebuffer size on init.". Looking at 
the Git history again, I agree there's already at least a theoretical issue in 
5.11, though I suspect it's harder to hit in practice.


> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index c13985fb35be..933190281b91 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -1085,14 +1085,17 @@ int amdgpu_display_gem_fb_verify_and_init(
>   mode_cmd->modifier[0]);
>  
>   ret = -EINVAL;
> - goto err;
> + goto err_fb_cleanup;
>   }
>  
>   ret = amdgpu_display_framebuffer_init(dev, rfb, mode_cmd, obj);
>   if (ret)
> - goto err;
> + goto err_fb_cleanup;
>  
>   return 0;
> +err_fb_cleanup:
> + drm_framebuffer_unregister_private(>base);
> + drm_framebuffer_cleanup(>base);
>  err:
>   drm_dbg_kms(dev, "Failed to verify and init gem fb: %d\n", ret);
>   rfb->base.obj[0] = NULL;
> 

There's a similar issue in amdgpu_display_gem_fb_init. 
https://patchwork.freedesktop.org/patch/439542/ fixes that as well, and seems 
simpler (though I'm biased obviously :).


Neither patch can be trivially cherry picked for fixing the issue in 5.11/5.12 
due to f258907fdd835e.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v1] drm/amdgpu: fix framebuffer memory use after free

2021-06-17 Thread Lukasz Bartosik
With option CONFIG_DEBUG_LIST enabled the kernel logs show list_add
corruption warning. The warning originates from drm_framebuffer_init()
function which adds framebuffer to a framebuffers list and is called by
amdgpu_display_gem_fb_verify_and_init().
If amdgpu_display_gem_fb_verify_and_init() encounters an error after
calling drm_framebuffer_init() then framebuffer memory is released
in amdgpu_display_user_framebuffer_create() without removing framebuffer
from the list where it was added. Reuse of that memory by any other
party cause corruption of the framebuffers linked list. This fix removes
framebuffer from the linked list and unregisters it in case of failure.

[   23.252465] [ cut here ]
[   23.252469] list_add corruption. next->prev should be prev 
(9921c16203a8), but was 733a656c69665f6d. (next=9920debec508).
[   23.252506] WARNING: CPU: 1 PID: 1637 at lib/list_debug.c:25 
__list_add_valid+0x56/0x8f
[   23.252520] Modules linked in: xt_cgroup rfcomm cmac algif_hash 
algif_skcipher af_alg uinput xt_MASQUERADE uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 videobuf2_common btusb btrtl btintel btbcm 
bluetooth ecdh_generic ecc iio_trig_sysfs snd_hda_codec_hdmi designware_i2s 
snd_hda_intel snd_intel_dspcfg i2c_piix4 snd_hda_codec snd_hwdep snd_hda_core 
acpi_als industrialio_triggered_buffer kfifo_buf industrialio 
snd_soc_acp_rn_rt5682_mach snd_soc_max98357a snd_soc_adau7002 
snd_soc_acp_rt5682_mach snd_soc_acp_da7219mx98357_mach acp_audio_dma 
snd_soc_da7219 ip6table_nat fuse ath10k_pci ath10k_core ath mac80211 cfg80211 
r8152 mii joydev
[   23.252595] CPU: 1 PID: 1637 Comm: DrmThread Not tainted 5.13.0-rc6 #22
[   23.252603] Hardware name: HP Grunt/Grunt, BIOS Google_Grunt.11031.162.0 
04/08/2021
[   23.252608] RIP: 0010:__list_add_valid+0x56/0x8f
[   23.252616] Code: 47 4c 39 fb 0f 95 c1 4c 39 f3 0f 95 c0 20 c8 5b 41 5e 41 
5f 5d c3 48 c7 c7 f4 37 9f bc 4c 89 f6 4c 89 f9 31 c0 e8 0c c7 c5 ff <0f> 0b eb 
16 48 c7 c7 c2 f6 99 bc 4c 89 fe 4c 89 f1 31 c0 e8 f4 c6
[   23.252622] RSP: 0018:ace940c87ba0 EFLAGS: 00010246
[   23.252629] RAX: 8ae6d9bca9cb6a00 RBX: 9920debec100 RCX: 8ae6d9bca9cb6a00
[   23.252634] RDX: 0027 RSI: dfff RDI: 9921ead12e48
[   23.252638] RBP: ace940c87bb8 R08:  R09: ace940c87938
[   23.252643] R10: dfff R11: bcc821f0 R12: 
[   23.252647] R13: 9920debec508 R14: 9921c16203a8 R15: 9920debec508
[   23.252652] FS:  7e6190717640() GS:9921ead0() 
knlGS:
[   23.252658] CS:  0010 DS:  ES:  CR0: 80050033
[   23.252662] CR2: 2e7a09404000 CR3: 000120644000 CR4: 001506e0
[   23.252667] Call Trace:
[   23.252676]  drm_framebuffer_init+0xfb/0x13c
[   23.252685]  amdgpu_display_gem_fb_verify_and_init+0x4b/0x10a
[   23.252693]  ? kmem_cache_alloc_trace+0x104/0x1dc
[   23.252703]  amdgpu_display_user_framebuffer_create+0xe0/0x195
[   23.252709]  drm_internal_framebuffer_create+0x2fd/0x3d4
[   23.252718]  drm_mode_addfb2+0x39/0xd6
[   23.252724]  ? drm_internal_framebuffer_create+0x3d4/0x3d4
[   23.252731]  drm_ioctl_kernel+0x99/0xfb
[   23.252739]  drm_ioctl+0x25a/0x3a4
[   23.252745]  ? drm_internal_framebuffer_create+0x3d4/0x3d4
[   23.252753]  amdgpu_drm_ioctl+0x49/0x7d
[   23.252760]  __se_sys_ioctl+0x7c/0xb8
[   23.252767]  do_syscall_64+0x5f/0xa9
[   23.252774]  ? exit_to_user_mode_prepare+0x68/0x81
[   23.252781]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   23.252790] RIP: 0033:0x7e619b2aff07
[   23.252796] Code: 3c 1c 48 f7 d8 49 39 c4 72 b2 e8 14 ff ff ff 85 c0 78 b7 
48 83 c4 08 4c 89 e0 5b 41 5c 41 5d 5d c3 66 90 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 29 af 0b 00 f7 d8 64 89 01 48
[   23.252801] RSP: 002b:7e6190715e48 EFLAGS: 0246 ORIG_RAX: 
0010
[   23.252808] RAX: ffda RBX: 4120acd8347d3800 RCX: 7e619b2aff07
[   23.252812] RDX: 7e6190715e80 RSI: c06864b8 RDI: 001d
[   23.252817] RBP: 7e6190715e70 R08: 7e61907160e8 R09: 7e61907160f8
[   23.252821] R10: 7e6190716108 R11: 0246 R12: 001d
[   23.252825] R13: 0168 R14: 7e6190715e80 R15: c06864b8
[   23.252830] ---[ end trace 34051e69065d2c6d ]---

Fixes: 6eed95b00b45 ("drm/amd/display: Store tiling_flags in the framebuffer.")
Signed-off-by: Lukasz Bartosik 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index c13985fb35be..933190281b91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1085,14 +1085,17 @@ int amdgpu_display_gem_fb_verify_and_init(
mode_cmd->modifier[0]);
 
ret = -EINVAL;
-   goto err;

RE: [PATCH] drm/amd/pm: Disable SMU messages in navi10 sriov

2021-06-17 Thread Liu, Monk
[AMD Official Use Only]

Reviewed-by: Monk Liu 

Thanks 

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Yifan Zha  
Sent: Friday, June 11, 2021 6:49 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Chen, JingWen ; Zha, 
YiFan(Even) 
Subject: [PATCH] drm/amd/pm: Disable SMU messages in navi10 sriov

[Why]
sriov vf send unsupported SMU message lead to fail.

[How]
disable related messages in sriov.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 78fe13183e8b..e1b019115e92 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -80,10 +80,10 @@ static struct cmn2asic_msg_mapping 
navi10_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(SetAllowedFeaturesMaskHigh, 
PPSMC_MSG_SetAllowedFeaturesMaskHigh,   0),
MSG_MAP(EnableAllSmuFeatures,   PPSMC_MSG_EnableAllSmuFeatures, 
0),
MSG_MAP(DisableAllSmuFeatures,  
PPSMC_MSG_DisableAllSmuFeatures,0),
-   MSG_MAP(EnableSmuFeaturesLow,   PPSMC_MSG_EnableSmuFeaturesLow, 
1),
-   MSG_MAP(EnableSmuFeaturesHigh,  
PPSMC_MSG_EnableSmuFeaturesHigh,1),
-   MSG_MAP(DisableSmuFeaturesLow,  
PPSMC_MSG_DisableSmuFeaturesLow,1),
-   MSG_MAP(DisableSmuFeaturesHigh, 
PPSMC_MSG_DisableSmuFeaturesHigh,   1),
+   MSG_MAP(EnableSmuFeaturesLow,   PPSMC_MSG_EnableSmuFeaturesLow, 
0),
+   MSG_MAP(EnableSmuFeaturesHigh,  
PPSMC_MSG_EnableSmuFeaturesHigh,0),
+   MSG_MAP(DisableSmuFeaturesLow,  
PPSMC_MSG_DisableSmuFeaturesLow,0),
+   MSG_MAP(DisableSmuFeaturesHigh, 
PPSMC_MSG_DisableSmuFeaturesHigh,   0),
MSG_MAP(GetEnabledSmuFeaturesLow,   
PPSMC_MSG_GetEnabledSmuFeaturesLow, 1),
MSG_MAP(GetEnabledSmuFeaturesHigh,  
PPSMC_MSG_GetEnabledSmuFeaturesHigh,1),
MSG_MAP(SetWorkloadMask,PPSMC_MSG_SetWorkloadMask,  
1),
-- 
2.25.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/1] drm/amdgpu: add helper function for vm pasid

2021-06-17 Thread Nirmoy Das
Cleanup code related to vm pasid by adding helper function.
This reduces lots code duplication.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  17 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 176 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |   2 +-
 3 files changed, 96 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index cbb932f97355..27851fb0e25b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1149,7 +1149,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
 {
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_fpriv *fpriv;
-   int r, pasid;
+   int r;
 
/* Ensure IB tests are run on ring */
flush_delayed_work(>delayed_init_work);
@@ -1172,15 +1172,9 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
goto out_suspend;
}
 
-   pasid = amdgpu_pasid_alloc(16);
-   if (pasid < 0) {
-   dev_warn(adev->dev, "No more PASIDs available!");
-   pasid = 0;
-   }
-
-   r = amdgpu_vm_init(adev, >vm, pasid);
+   r = amdgpu_vm_init(adev, >vm);
if (r)
-   goto error_pasid;
+   goto free_fpriv;
 
fpriv->prt_va = amdgpu_vm_bo_add(adev, >vm, NULL);
if (!fpriv->prt_va) {
@@ -1208,10 +1202,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
 error_vm:
amdgpu_vm_fini(adev, >vm);
 
-error_pasid:
-   if (pasid)
-   amdgpu_pasid_free(pasid);
-
+free_fpriv:
kfree(fpriv);
 
 out_suspend:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 63975bda8e76..562c2c48a3a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -87,6 +87,69 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
+static int amdgpu_vm_pasid_alloc(struct amdgpu_device *adev,
+struct amdgpu_vm *vm, unsigned int pasid)
+{
+   unsigned long flags;
+   int r;
+
+   if (!pasid)
+   return 0;
+
+   spin_lock_irqsave(>vm_manager.pasid_lock, flags);
+   r = idr_alloc(>vm_manager.pasid_idr, vm, pasid, pasid + 1,
+ GFP_ATOMIC);
+   spin_unlock_irqrestore(>vm_manager.pasid_lock, flags);
+   if (r < 0)
+   return r;
+
+   vm->pasid = pasid;
+   return 0;
+}
+static void amdgpu_vm_pasid_remove_id(struct amdgpu_device *adev,
+ unsigned int pasid)
+{
+   unsigned long flags;
+
+   if (!pasid)
+   return;
+
+   spin_lock_irqsave(>vm_manager.pasid_lock, flags);
+   idr_remove(>vm_manager.pasid_idr, pasid);
+   spin_unlock_irqrestore(>vm_manager.pasid_lock, flags);
+
+}
+
+static void amdgpu_vm_pasid_remove(struct amdgpu_device *adev,
+  struct amdgpu_vm *vm)
+{
+   amdgpu_vm_pasid_remove_id(adev, vm->pasid);
+   vm->pasid = 0;
+}
+
+static void amdgpu_vm_pasid_free(struct amdgpu_device *adev,
+struct amdgpu_vm *vm)
+{
+   if (!vm->pasid)
+   return;
+
+   amdgpu_pasid_free(vm->pasid);
+   amdgpu_vm_pasid_remove(adev, vm);
+}
+
+static struct amdgpu_vm *amdgpu_vm_pasid_find(struct amdgpu_device *adev,
+ unsigned int pasid)
+{
+   struct amdgpu_vm *vm;
+   unsigned long flags;
+
+   spin_lock_irqsave(>vm_manager.pasid_lock, flags);
+   vm = idr_find(>vm_manager.pasid_idr, pasid);
+   spin_unlock_irqrestore(>vm_manager.pasid_lock, flags);
+
+   return vm;
+}
+
 /*
  * vm eviction_lock can be taken in MMU notifiers. Make sure no reclaim-FS
  * happens while holding this lock anywhere to prevent deadlocks when
@@ -2859,17 +2922,17 @@ long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long 
timeout)
  *
  * @adev: amdgpu_device pointer
  * @vm: requested vm
- * @pasid: Process address space identifier
  *
  * Init @vm fields.
  *
  * Returns:
  * 0 for success, error for failure.
  */
-int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, u32 pasid)
+int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
struct amdgpu_bo *root_bo;
struct amdgpu_bo_vm *root;
+   unsigned int pasid;
int r, i;
 
vm->va = RB_ROOT_CACHED;
@@ -2940,19 +3003,15 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm, u32 pasid)
 
amdgpu_bo_unreserve(vm->root.bo);
 
-   if (pasid) {
-   unsigned long flags;
-
-   spin_lock_irqsave(>vm_manager.pasid_lock, flags);
-   r = idr_alloc(>vm_manager.pasid_idr, vm, pasid, pasid + 1,
- GFP_ATOMIC);
-  

[PATCH V2 3/7] drm/amdgpu: fix NAK-G generation during PCI-e link width switch

2021-06-17 Thread Evan Quan
A lot of NAK-G being generated when link widht switching is happening.
WA for this issue is to program the SPC to 4 symbols per clock during
bootup when the native PCIE width is x4.

Change-Id: I7a4d751e44bddc4bd1e97860cb4f53dfadc02a2c
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 455d0425787c..2e1d12369cec 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -63,6 +63,10 @@
 #include "mxgpu_nv.h"
 #include "smuio_v11_0.h"
 #include "smuio_v11_0_6.h"
+#include "nbio/nbio_2_3_sh_mask.h"
+
+#define smnPCIE_LC_LINK_WIDTH_CNTL0x11140288
+#define smnPCIE_LC_CNTL6  0x111402ec
 
 static const struct amd_ip_funcs nv_common_ip_funcs;
 
@@ -1407,10 +1411,35 @@ static int nv_common_sw_fini(void *handle)
return 0;
 }
 
+static void nv_apply_lc_spc_mode_wa(struct amdgpu_device *adev)
+{
+   uint32_t reg_data = 0;
+   uint32_t link_width = 0;
+
+   reg_data = RREG32_PCIE(smnPCIE_LC_LINK_WIDTH_CNTL);
+   link_width = (reg_data & PCIE_LC_LINK_WIDTH_CNTL__LC_LINK_WIDTH_RD_MASK)
+   >> PCIE_LC_LINK_WIDTH_CNTL__LC_LINK_WIDTH_RD__SHIFT;
+
+   /*
+* Program PCIE_LC_CNTL6.LC_SPC_MODE_8GT to 0x2 (4 symbols per clock 
data)
+* if link_width is 0x3 (x4)
+*/
+   if (0x3 == link_width) {
+   reg_data = RREG32_PCIE(smnPCIE_LC_CNTL6);
+   reg_data &= ~PCIE_LC_CNTL6__LC_SPC_MODE_8GT_MASK;
+   reg_data |= (0x2 << PCIE_LC_CNTL6__LC_SPC_MODE_8GT__SHIFT);
+   WREG32_PCIE(smnPCIE_LC_CNTL6, reg_data);
+   }
+}
+
 static int nv_common_hw_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   if ((adev->asic_type >= CHIP_NAVI10) &&
+(adev->asic_type <= CHIP_NAVI12))
+   nv_apply_lc_spc_mode_wa(adev);
+
/* enable pcie gen2/3 link */
nv_pcie_gen3_enable(adev);
/* enable aspm */
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH V2 1/7] drm/amdgpu: correct tcp harvest setting

2021-06-17 Thread Evan Quan
Add missing settings for SQC bits. And correct some confusing logics
around active wgp bitmap calculation.

Change-Id: If4992e175fd61d5609b00328cbe21f487517d039
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 28 --
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7bfe6f9d3a52..94942c6cae24 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -5109,6 +5109,7 @@ static void gfx_v10_0_tcp_harvest(struct amdgpu_device 
*adev)
for (k = 0; k < max_wgp_per_sh; k++) {
if (!(wgp_active_bitmap & (1 << k))) {
gcrd_targets_disable_tcp |= 3 
<< (2 * k);
+   gcrd_targets_disable_tcp |= 1 
<< (k + (max_wgp_per_sh * 2));
utcl_invreq_disable |= (3 << (2 
* k)) |
(3 << (2 * 
(max_wgp_per_sh + k)));
}
@@ -5116,13 +5117,13 @@ static void gfx_v10_0_tcp_harvest(struct amdgpu_device 
*adev)
 
tmp = RREG32_SOC15(GC, 0, 
mmUTCL1_UTCL0_INVREQ_DISABLE);
/* only override TCP & SQC bits */
-   tmp &= 0x << (4 * max_wgp_per_sh);
+   tmp &= 0xff00;
tmp |= (utcl_invreq_disable & 
utcl_invreq_disable_mask);
WREG32_SOC15(GC, 0, 
mmUTCL1_UTCL0_INVREQ_DISABLE, tmp);
 
tmp = RREG32_SOC15(GC, 0, 
mmGCRD_SA_TARGETS_DISABLE);
-   /* only override TCP bits */
-   tmp &= 0x << (2 * max_wgp_per_sh);
+   /* only override TCP & SQC bits */
+   tmp &= 0xfffc;
tmp |= (gcrd_targets_disable_tcp & 
gcrd_targets_disable_mask);
WREG32_SOC15(GC, 0, mmGCRD_SA_TARGETS_DISABLE, 
tmp);
}
@@ -9332,17 +9333,22 @@ static void 
gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device *
 
 static u32 gfx_v10_0_get_wgp_active_bitmap_per_sh(struct amdgpu_device *adev)
 {
-   u32 data, wgp_bitmask;
-   data = RREG32_SOC15(GC, 0, mmCC_GC_SHADER_ARRAY_CONFIG);
-   data |= RREG32_SOC15(GC, 0, mmGC_USER_SHADER_ARRAY_CONFIG);
+   u32 disabled_mask =
+   ~amdgpu_gfx_create_bitmask(adev->gfx.config.max_cu_per_sh >> 1);
+   u32 efuse_setting = 0;
+   u32 vbios_setting = 0;
+
+   efuse_setting = RREG32_SOC15(GC, 0, mmCC_GC_SHADER_ARRAY_CONFIG);
+   efuse_setting &= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_WGPS_MASK;
+   efuse_setting >>= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_WGPS__SHIFT;
 
-   data &= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_WGPS_MASK;
-   data >>= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_WGPS__SHIFT;
+   vbios_setting = RREG32_SOC15(GC, 0, mmGC_USER_SHADER_ARRAY_CONFIG);
+   vbios_setting &= GC_USER_SHADER_ARRAY_CONFIG__INACTIVE_WGPS_MASK;
+   vbios_setting >>= GC_USER_SHADER_ARRAY_CONFIG__INACTIVE_WGPS__SHIFT;
 
-   wgp_bitmask =
-   amdgpu_gfx_create_bitmask(adev->gfx.config.max_cu_per_sh >> 1);
+   disabled_mask |= efuse_setting | vbios_setting;
 
-   return (~data) & wgp_bitmask;
+   return (~disabled_mask);
 }
 
 static u32 gfx_v10_0_get_cu_active_bitmap_per_sh(struct amdgpu_device *adev)
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH V2 4/7] drm/amdgpu: fix the hang caused by PCIe link width switch

2021-06-17 Thread Evan Quan
SMU had set all the necessary fields for a link width switch
but the width switch wasn't occurring because the link was idle
in the L1 state. Setting LC_L1_RECONFIG_EN=0x1 will allow width
switches to also be initiated while in L1 instead of waiting until
the link is back in L0.

Change-Id: I6315681f6fb194036b20991512dd88fa65bc0d56
Signed-off-by: Evan Quan 
---
V1->V2:
  - limit the change for Navi10 only
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2e1d12369cec..f31c331a1c48 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1432,6 +1432,15 @@ static void nv_apply_lc_spc_mode_wa(struct amdgpu_device 
*adev)
}
 }
 
+static void nv_apply_l1_link_width_reconfig_wa(struct amdgpu_device *adev)
+{
+   uint32_t reg_data = 0;
+
+   reg_data = RREG32_PCIE(smnPCIE_LC_LINK_WIDTH_CNTL);
+   reg_data |= PCIE_LC_LINK_WIDTH_CNTL__LC_L1_RECONFIG_EN_MASK;
+   WREG32_PCIE(smnPCIE_LC_LINK_WIDTH_CNTL, reg_data);
+}
+
 static int nv_common_hw_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -1440,6 +1449,9 @@ static int nv_common_hw_init(void *handle)
 (adev->asic_type <= CHIP_NAVI12))
nv_apply_lc_spc_mode_wa(adev);
 
+   if (adev->asic_type == CHIP_NAVI10)
+   nv_apply_l1_link_width_reconfig_wa(adev);
+
/* enable pcie gen2/3 link */
nv_pcie_gen3_enable(adev);
/* enable aspm */
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH V2 2/7] drm/amdgpu: fix Navi1x tcp power gating hang when issuing lightweight invalidaiton

2021-06-17 Thread Evan Quan
Fix TCP hang when a lightweight invalidation happens on Navi1x.

Change-Id: I5000fefa9ec48a5e863372d298354bed1562b332
Signed-off-by: Evan Quan 
---
V1->V2:
  - Alex: use ARRAY_SIZE instead of hard code
  - limit the changes for Navi1x only
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 95 ++
 1 file changed, 95 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 94942c6cae24..5cfd4800d5be 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -7970,6 +7970,97 @@ static void 
gfx_v10_0_update_fine_grain_clock_gating(struct amdgpu_device *adev,
}
 }
 
+static void gfx_v10_0_apply_medium_grain_clock_gating_workaround(struct 
amdgpu_device *adev)
+{
+   uint32_t reg_data = 0;
+   uint32_t reg_idx = 0;
+   uint32_t i;
+
+   const uint32_t tcp_ctrl_regs[] = {
+   mmCGTS_SA0_WGP00_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP00_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP01_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP01_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP02_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP02_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP10_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP10_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP11_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP11_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP12_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP12_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP00_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP00_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP01_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP01_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP02_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP02_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP10_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP10_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP11_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP11_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP12_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP12_CU1_TCP_CTRL_REG
+   };
+
+   const uint32_t tcp_ctrl_regs_nv12[] = {
+   mmCGTS_SA0_WGP00_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP00_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP01_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP01_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP02_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP02_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP10_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP10_CU1_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP11_CU0_TCP_CTRL_REG,
+   mmCGTS_SA0_WGP11_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP00_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP00_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP01_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP01_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP02_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP02_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP10_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP10_CU1_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP11_CU0_TCP_CTRL_REG,
+   mmCGTS_SA1_WGP11_CU1_TCP_CTRL_REG,
+   };
+
+   const uint32_t sm_ctlr_regs[] = {
+   mmCGTS_SA0_QUAD0_SM_CTRL_REG,
+   mmCGTS_SA0_QUAD1_SM_CTRL_REG,
+   mmCGTS_SA1_QUAD0_SM_CTRL_REG,
+   mmCGTS_SA1_QUAD1_SM_CTRL_REG
+   };
+
+   if (adev->asic_type == CHIP_NAVI12) {
+   for (i = 0; i < ARRAY_SIZE(tcp_ctrl_regs_nv12); i++) {
+   reg_idx = 
adev->reg_offset[GC_HWIP][0][mmCGTS_SA0_WGP00_CU0_TCP_CTRL_REG_BASE_IDX] +
+ tcp_ctrl_regs_nv12[i];
+   reg_data = RREG32(reg_idx);
+   reg_data |= 
CGTS_SA0_WGP00_CU0_TCP_CTRL_REG__TCPI_LS_OVERRIDE_MASK;
+   WREG32(reg_idx, reg_data);
+   }
+   } else {
+   for (i = 0; i < ARRAY_SIZE(tcp_ctrl_regs); i++) {
+   reg_idx = 
adev->reg_offset[GC_HWIP][0][mmCGTS_SA0_WGP00_CU0_TCP_CTRL_REG_BASE_IDX] +
+ tcp_ctrl_regs[i];
+   reg_data = RREG32(reg_idx);
+   reg_data |= 
CGTS_SA0_WGP00_CU0_TCP_CTRL_REG__TCPI_LS_OVERRIDE_MASK;
+   WREG32(reg_idx, reg_data);
+   }
+   }
+
+   for (i = 0; i < ARRAY_SIZE(sm_ctlr_regs); i++) {
+   reg_idx = 
adev->reg_offset[GC_HWIP][0][mmCGTS_SA0_QUAD0_SM_CTRL_REG_BASE_IDX] +
+ sm_ctlr_regs[i];
+   reg_data = RREG32(reg_idx);
+   reg_data &= ~CGTS_SA0_QUAD0_SM_CTRL_REG__SM_MODE_MASK;
+   reg_data |= 2 << CGTS_SA0_QUAD0_SM_CTRL_REG__SM_MODE__SHIFT;
+   WREG32(reg_idx, reg_data);
+   }
+}
+
 static int gfx_v10_0_update_gfx_clock_gating(struct amdgpu_device *adev,
   

[PATCH V2 6/7] drm/amdgpu: update GFX MGCG settings

2021-06-17 Thread Evan Quan
Update GFX MGCG related settings.

Change-Id: I0b7b8e7c97859f99db5f52026abbb4d226c179df
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7d9464611d26..73685492d36a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -7786,11 +7786,11 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade
 {
uint32_t data, def;
 
-   if (!(adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG))
+   if (!(adev->cg_flags & (AMD_CG_SUPPORT_GFX_MGCG | 
AMD_CG_SUPPORT_GFX_MGLS)))
return;
 
/* It is disabled by HW by default */
-   if (enable) {
+   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
/* 0 - Disable some blocks' MGCG */
WREG32_SOC15(GC, 0, mmGRBM_GFX_INDEX, 0xe000);
WREG32_SOC15(GC, 0, mmCGTT_WD_CLK_CTRL, 0xff00);
@@ -7803,6 +7803,7 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade
  RLC_CGTT_MGCG_OVERRIDE__RLC_CGTT_SCLK_OVERRIDE_MASK |
  RLC_CGTT_MGCG_OVERRIDE__GFXIP_MGCG_OVERRIDE_MASK |
  RLC_CGTT_MGCG_OVERRIDE__GFXIP_MGLS_OVERRIDE_MASK |
+ RLC_CGTT_MGCG_OVERRIDE__GFXIP_FGCG_OVERRIDE_MASK |
  RLC_CGTT_MGCG_OVERRIDE__ENABLE_CGTS_LEGACY_MASK);
 
if (def != data)
@@ -7825,13 +7826,15 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade
WREG32_SOC15(GC, 0, mmCP_MEM_SLP_CNTL, 
data);
}
}
-   } else {
+   } else if (!enable || !(adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
/* 1 - MGCG_OVERRIDE */
def = data = RREG32_SOC15(GC, 0, mmRLC_CGTT_MGCG_OVERRIDE);
data |= (RLC_CGTT_MGCG_OVERRIDE__RLC_CGTT_SCLK_OVERRIDE_MASK |
 RLC_CGTT_MGCG_OVERRIDE__GRBM_CGTT_SCLK_OVERRIDE_MASK |
 RLC_CGTT_MGCG_OVERRIDE__GFXIP_MGCG_OVERRIDE_MASK |
-RLC_CGTT_MGCG_OVERRIDE__GFXIP_MGLS_OVERRIDE_MASK);
+RLC_CGTT_MGCG_OVERRIDE__GFXIP_MGLS_OVERRIDE_MASK |
+RLC_CGTT_MGCG_OVERRIDE__GFXIP_FGCG_OVERRIDE_MASK |
+RLC_CGTT_MGCG_OVERRIDE__ENABLE_CGTS_LEGACY_MASK);
if (def != data)
WREG32_SOC15(GC, 0, mmRLC_CGTT_MGCG_OVERRIDE, data);
 
-- 
2.29.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH V2 7/7] drm/amdgpu: update HDP LS settings

2021-06-17 Thread Evan Quan
Avoid unnecessary register programming on feature disablement.

Change-Id: Ia8ad4fb28cb23f80ddcf1399eace284e4d33bd90
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c | 85 +++
 1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
index 7a15e669b68d..5793977953cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
@@ -90,45 +90,56 @@ static void hdp_v5_0_update_mem_power_gating(struct 
amdgpu_device *adev,
 RC_MEM_POWER_SD_EN, 0);
WREG32_SOC15(HDP, 0, mmHDP_MEM_POWER_CTRL, hdp_mem_pwr_cntl);
 
-   /* only one clock gating mode (LS/DS/SD) can be enabled */
-   if (adev->cg_flags & AMD_CG_SUPPORT_HDP_LS) {
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-IPH_MEM_POWER_LS_EN, enable);
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-RC_MEM_POWER_LS_EN, enable);
-   } else if (adev->cg_flags & AMD_CG_SUPPORT_HDP_DS) {
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-IPH_MEM_POWER_DS_EN, enable);
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-RC_MEM_POWER_DS_EN, enable);
-   } else if (adev->cg_flags & AMD_CG_SUPPORT_HDP_SD) {
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-IPH_MEM_POWER_SD_EN, enable);
-   /* RC should not use shut down mode, fallback to ds */
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
-HDP_MEM_POWER_CTRL,
-RC_MEM_POWER_DS_EN, enable);
-   }
-
-   /* confirmed that IPH_MEM_POWER_CTRL_EN and RC_MEM_POWER_CTRL_EN have to
-* be set for SRAM LS/DS/SD */
-   if (adev->cg_flags & (AMD_CG_SUPPORT_HDP_LS | AMD_CG_SUPPORT_HDP_DS |
- AMD_CG_SUPPORT_HDP_SD)) {
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, 
HDP_MEM_POWER_CTRL,
-IPH_MEM_POWER_CTRL_EN, 1);
-   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, 
HDP_MEM_POWER_CTRL,
-RC_MEM_POWER_CTRL_EN, 1);
+   /* Already disabled above. The actions below are for "enabled" only */
+   if (enable) {
+   /* only one clock gating mode (LS/DS/SD) can be enabled */
+   if (adev->cg_flags & AMD_CG_SUPPORT_HDP_LS) {
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
+HDP_MEM_POWER_CTRL,
+IPH_MEM_POWER_LS_EN, 
1);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
+HDP_MEM_POWER_CTRL,
+RC_MEM_POWER_LS_EN, 1);
+   } else if (adev->cg_flags & AMD_CG_SUPPORT_HDP_DS) {
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
+HDP_MEM_POWER_CTRL,
+IPH_MEM_POWER_DS_EN, 
1);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
+HDP_MEM_POWER_CTRL,
+RC_MEM_POWER_DS_EN, 1);
+   } else if (adev->cg_flags & AMD_CG_SUPPORT_HDP_SD) {
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl,
+HDP_MEM_POWER_CTRL,
+IPH_MEM_POWER_SD_EN, 
1);
+   /* RC should not use shut down mode, fallback to ds  or 
ls if allowed */
+   if (adev->cg_flags & AMD_CG_SUPPORT_HDP_DS)
+   hdp_mem_pwr_cntl = 
REG_SET_FIELD(hdp_mem_pwr_cntl,
+
HDP_MEM_POWER_CTRL,
+
RC_MEM_POWER_DS_EN, 1);
+   else if (adev->cg_flags & AMD_CG_SUPPORT_HDP_LS)
+   

[PATCH V2 5/7] drm/amdgpu: correct clock gating settings on feature unsupported

2021-06-17 Thread Evan Quan
Clock gating setting is still performed even when the corresponding
CG feature is not supported. And the tricky part is disablement is
actually performed no matter for enablement or disablement request.
That seems not logically right.
Considering HW should already properly take care of the CG state, we
will just skip the corresponding clock gating setting when the feature
is not supported.

Change-Id: Ic0995cf3de9f36b59316a90a28b7c95a08f4dccd
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/athub_v2_0.c  | 12 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 69 ++--
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 10 +++-
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c   | 10 +++-
 drivers/gpu/drm/amd/amdgpu/smuio_v11_0.c |  5 +-
 5 files changed, 83 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/athub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/athub_v2_0.c
index 5b90efd6f6d0..3ac505d954c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/athub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/athub_v2_0.c
@@ -36,9 +36,12 @@ athub_v2_0_update_medium_grain_clock_gating(struct 
amdgpu_device *adev,
 {
uint32_t def, data;
 
+   if (!(adev->cg_flags & AMD_CG_SUPPORT_MC_MGCG))
+   return;
+
def = data = RREG32_SOC15(ATHUB, 0, mmATHUB_MISC_CNTL);
 
-   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_MC_MGCG))
+   if (enable)
data |= ATHUB_MISC_CNTL__CG_ENABLE_MASK;
else
data &= ~ATHUB_MISC_CNTL__CG_ENABLE_MASK;
@@ -53,10 +56,13 @@ athub_v2_0_update_medium_grain_light_sleep(struct 
amdgpu_device *adev,
 {
uint32_t def, data;
 
+   if (!((adev->cg_flags & AMD_CG_SUPPORT_MC_LS) &&
+  (adev->cg_flags & AMD_CG_SUPPORT_HDP_LS)))
+   return;
+
def = data = RREG32_SOC15(ATHUB, 0, mmATHUB_MISC_CNTL);
 
-   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_MC_LS) &&
-   (adev->cg_flags & AMD_CG_SUPPORT_HDP_LS))
+   if (enable)
data |= ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK;
else
data &= ~ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 5cfd4800d5be..7d9464611d26 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -7786,8 +7786,11 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade
 {
uint32_t data, def;
 
+   if (!(adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG))
+   return;
+
/* It is disabled by HW by default */
-   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
+   if (enable) {
/* 0 - Disable some blocks' MGCG */
WREG32_SOC15(GC, 0, mmGRBM_GFX_INDEX, 0xe000);
WREG32_SOC15(GC, 0, mmCGTT_WD_CLK_CTRL, 0xff00);
@@ -7854,22 +7857,34 @@ static void gfx_v10_0_update_3d_clock_gating(struct 
amdgpu_device *adev,
 {
uint32_t data, def;
 
+   if (!(adev->cg_flags & (AMD_CG_SUPPORT_GFX_3D_CGCG | 
AMD_CG_SUPPORT_GFX_3D_CGLS)))
+   return;
+
/* Enable 3D CGCG/CGLS */
-   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG)) {
+   if (enable) {
/* write cmd to clear cgcg/cgls ov */
def = data = RREG32_SOC15(GC, 0, mmRLC_CGTT_MGCG_OVERRIDE);
+
/* unset CGCG override */
-   data &= ~RLC_CGTT_MGCG_OVERRIDE__GFXIP_GFX3D_CG_OVERRIDE_MASK;
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG)
+   data &= 
~RLC_CGTT_MGCG_OVERRIDE__GFXIP_GFX3D_CG_OVERRIDE_MASK;
+
/* update CGCG and CGLS override bits */
if (def != data)
WREG32_SOC15(GC, 0, mmRLC_CGTT_MGCG_OVERRIDE, data);
+
/* enable 3Dcgcg FSM(0x363f) */
def = RREG32_SOC15(GC, 0, mmRLC_CGCG_CGLS_CTRL_3D);
-   data = (0x36 << 
RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
-   RLC_CGCG_CGLS_CTRL_3D__CGCG_EN_MASK;
+   data = 0;
+
+   if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG)
+   data = (0x36 << 
RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
+   RLC_CGCG_CGLS_CTRL_3D__CGCG_EN_MASK;
+
if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGLS)
data |= (0x000F << 
RLC_CGCG_CGLS_CTRL_3D__CGLS_REP_COMPANSAT_DELAY__SHIFT) |
RLC_CGCG_CGLS_CTRL_3D__CGLS_EN_MASK;
+
if (def != data)
WREG32_SOC15(GC, 0, mmRLC_CGCG_CGLS_CTRL_3D, data);
 
@@ -7882,9 +7897,14 @@ static void gfx_v10_0_update_3d_clock_gating(struct 
amdgpu_device *adev,
} else {
/* Disable CGCG/CGLS */
def = data = RREG32_SOC15(GC, 0, mmRLC_CGCG_CGLS_CTRL_3D);
+
/* disable 

RE: [PATCH] drm/amd/pm: Disable SMU messages in navi10 sriov

2021-06-17 Thread Zhang, Jack (Jian)
[AMD Official Use Only]

Reviewed-by: Jack Zhang 


-Original Message-
From: amd-gfx  On Behalf Of Chen, JingWen
Sent: Friday, June 11, 2021 6:52 PM
To: Zha, YiFan(Even) ; amd-gfx@lists.freedesktop.org
Cc: Zha, YiFan(Even) ; Liu, Monk 
Subject: RE: [PATCH] drm/amd/pm: Disable SMU messages in navi10 sriov

[AMD Official Use Only]

[AMD Official Use Only]

Acked-by: Jingwen Chen 

Best Regards,
JingWen Chen

-Original Message-
From: Yifan Zha 
Sent: Friday, June 11, 2021 6:49 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Chen, JingWen ; Zha, 
YiFan(Even) 
Subject: [PATCH] drm/amd/pm: Disable SMU messages in navi10 sriov

[Why]
sriov vf send unsupported SMU message lead to fail.

[How]
disable related messages in sriov.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 78fe13183e8b..e1b019115e92 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -80,10 +80,10 @@ static struct cmn2asic_msg_mapping 
navi10_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(SetAllowedFeaturesMaskHigh, 
PPSMC_MSG_SetAllowedFeaturesMaskHigh,   0),
MSG_MAP(EnableAllSmuFeatures,   PPSMC_MSG_EnableAllSmuFeatures, 
0),
MSG_MAP(DisableAllSmuFeatures,  
PPSMC_MSG_DisableAllSmuFeatures,0),
-   MSG_MAP(EnableSmuFeaturesLow,   PPSMC_MSG_EnableSmuFeaturesLow, 
1),
-   MSG_MAP(EnableSmuFeaturesHigh,  
PPSMC_MSG_EnableSmuFeaturesHigh,1),
-   MSG_MAP(DisableSmuFeaturesLow,  
PPSMC_MSG_DisableSmuFeaturesLow,1),
-   MSG_MAP(DisableSmuFeaturesHigh, 
PPSMC_MSG_DisableSmuFeaturesHigh,   1),
+   MSG_MAP(EnableSmuFeaturesLow,   PPSMC_MSG_EnableSmuFeaturesLow, 
0),
+   MSG_MAP(EnableSmuFeaturesHigh,  
PPSMC_MSG_EnableSmuFeaturesHigh,0),
+   MSG_MAP(DisableSmuFeaturesLow,  
PPSMC_MSG_DisableSmuFeaturesLow,0),
+   MSG_MAP(DisableSmuFeaturesHigh, 
PPSMC_MSG_DisableSmuFeaturesHigh,   0),
MSG_MAP(GetEnabledSmuFeaturesLow,   
PPSMC_MSG_GetEnabledSmuFeaturesLow, 1),
MSG_MAP(GetEnabledSmuFeaturesHigh,  
PPSMC_MSG_GetEnabledSmuFeaturesHigh,1),
MSG_MAP(SetWorkloadMask,PPSMC_MSG_SetWorkloadMask,  
1),
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7CJack.Zhang1%40amd.com%7C217fd989e7324c61c37f08d92cc6eda6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637590055175617495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=BZQrUILQlHA3VmkwytrLKFP05FW8UUDFLR4XQFjxELs%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread Pan, Xinhui
Felix
what I am thinking of like below looks like more simple. :)

@@ -1501,6 +1501,11 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
/* remove queue from list to prevent rescheduling after preemption */
dqm_lock(dqm);
 
+   if (dqm->is_hws_hang) {
+   retval = -EIO;
+   goto failed_try_destroy_debugged_queue;
+   }
+
if (qpd->is_debug) {
/*
 * error, currently we do not allow to destroy a queue

> 2021年6月17日 20:02,Pan, Xinhui  写道:
> 
> Handle queue destroy failure while CP hang.
> Once CP got hang, kfd trigger GPU reset and set related flags to stop
> driver touching the queue. As we leave the queue as it is, we need keep
> the resource as it is too.
> 
> Regardless user-space tries to destroy the queue again or not. We need
> put queue back to the list so process termination would do the cleanup
> work. What's more, if userspace tries to destroy the queue again, we
> would not free its resource twice.
> 
> Kfd return -EIO in this case, so lets handle it now.
> 
> Paste some error log below without this patch.
> 
> amdgpu: Can't create new usermode queue because -1 queues were already
> created
> 
> refcount_t: underflow; use-after-free.
> Call Trace:
> kobject_put+0xe6/0x1b0
> kfd_procfs_del_queue+0x37/0x50 [amdgpu]
> pqm_destroy_queue+0x17a/0x390 [amdgpu]
> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
> kfd_ioctl+0x463/0x690 [amdgpu]
> 
> BUG kmalloc-32 (Tainted: GW): Object already free
> INFO: Allocated in allocate_sdma_mqd+0x30/0xb0 [amdgpu] age=4796 cpu=2
> pid=2511
> __slab_alloc+0x72/0x80
> kmem_cache_alloc_trace+0x81f/0x8c0
> allocate_sdma_mqd+0x30/0xb0 [amdgpu]
> create_queue_cpsch+0xbf/0x470 [amdgpu]
> pqm_create_queue+0x28d/0x6d0 [amdgpu]
> kfd_ioctl_create_queue+0x492/0xae0 [amdgpu]
> INFO: Freed in free_mqd_hiq_sdma+0x20/0x60 [amdgpu] age=2537 cpu=7
> pid=2511
> kfree+0x322/0x340
> free_mqd_hiq_sdma+0x20/0x60 [amdgpu]
> destroy_queue_cpsch+0x20c/0x330 [amdgpu]
> pqm_destroy_queue+0x1a3/0x390 [amdgpu]
> kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
> 
> Signed-off-by: xinhui pan 
> ---
> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   | 13 +
> drivers/gpu/drm/amd/amdkfd/kfd_process.c|  4 +++-
> .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  2 ++
> 3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index c069fa259b30..63a9a19a3987 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1530,6 +1530,11 @@ static int destroy_queue_cpsch(struct 
> device_queue_manager *dqm,
>   KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
>   if (retval == -ETIME)
>   qpd->reset_wavefronts = true;
> + /* In gpu reset? We leave the queue as it is, so do NOT
> +  * cleanup the resource.
> +  */
> + else if (retval == -EIO)
> + goto failed_execute_queue;
>   if (q->properties.is_gws) {
>   dqm->gws_queue_count--;
>   qpd->mapped_gws_queue = false;
> @@ -1551,6 +1556,14 @@ static int destroy_queue_cpsch(struct 
> device_queue_manager *dqm,
> 
>   return retval;
> 
> +failed_execute_queue:
> + /* Put queue back to the list, then we have chance to destroy it.
> +  * FIXME: we do NOT want the queue in the runlist again.
> +  */
> + list_add(>list, >queues_list);
> + qpd->queue_count++;
> + if (q->properties.is_active)
> + increment_queue_count(dqm, q->properties.type);
> failed_try_destroy_debugged_queue:
> 
>   dqm_unlock(dqm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 09b98a83f670..984197e5929f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -607,11 +607,13 @@ static int kfd_procfs_add_sysfs_files(struct 
> kfd_process *p)
> 
> void kfd_procfs_del_queue(struct queue *q)
> {
> - if (!q)
> + if (!q || !kobject_get_unless_zero(>kobj))
>   return;
> 
>   kobject_del(>kobj);
>   kobject_put(>kobj);
> + /* paired with the get above */
> + kobject_put(>kobj);
> }
> 
> int kfd_process_create_wq(void)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 95a6c36cea4c..0588e552b8ec 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -373,6 +373,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
> unsigned int qid)
>   dqm = pqn->kq->dev->dqm;
>   dqm->ops.destroy_kernel_queue(dqm, 

[PATCH v2 1/2] drm/amdkfd: Fix some double free when destroy queue fails

2021-06-17 Thread xinhui pan
Handle queue destroy failure while CP hang.
Once CP got hang, kfd trigger GPU reset and set related flags to stop
driver touching the queue. As we leave the queue as it is, we need keep
the resource as it is too.

Regardless user-space tries to destroy the queue again or not. We need
put queue back to the list so process termination would do the cleanup
work. What's more, if userspace tries to destroy the queue again, we
would not free its resource twice.

Kfd return -EIO in this case, so lets handle it now.

Paste some error log below without this patch.

amdgpu: Can't create new usermode queue because -1 queues were already
created

refcount_t: underflow; use-after-free.
Call Trace:
 kobject_put+0xe6/0x1b0
 kfd_procfs_del_queue+0x37/0x50 [amdgpu]
 pqm_destroy_queue+0x17a/0x390 [amdgpu]
 kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]
 kfd_ioctl+0x463/0x690 [amdgpu]

BUG kmalloc-32 (Tainted: GW): Object already free
INFO: Allocated in allocate_sdma_mqd+0x30/0xb0 [amdgpu] age=4796 cpu=2
pid=2511
 __slab_alloc+0x72/0x80
 kmem_cache_alloc_trace+0x81f/0x8c0
 allocate_sdma_mqd+0x30/0xb0 [amdgpu]
 create_queue_cpsch+0xbf/0x470 [amdgpu]
 pqm_create_queue+0x28d/0x6d0 [amdgpu]
 kfd_ioctl_create_queue+0x492/0xae0 [amdgpu]
INFO: Freed in free_mqd_hiq_sdma+0x20/0x60 [amdgpu] age=2537 cpu=7
pid=2511
 kfree+0x322/0x340
 free_mqd_hiq_sdma+0x20/0x60 [amdgpu]
 destroy_queue_cpsch+0x20c/0x330 [amdgpu]
 pqm_destroy_queue+0x1a3/0x390 [amdgpu]
 kfd_ioctl_destroy_queue+0x57/0xc0 [amdgpu]

Signed-off-by: xinhui pan 
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   | 13 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c|  4 +++-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  2 ++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c069fa259b30..63a9a19a3987 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1530,6 +1530,11 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
if (retval == -ETIME)
qpd->reset_wavefronts = true;
+   /* In gpu reset? We leave the queue as it is, so do NOT
+* cleanup the resource.
+*/
+   else if (retval == -EIO)
+   goto failed_execute_queue;
if (q->properties.is_gws) {
dqm->gws_queue_count--;
qpd->mapped_gws_queue = false;
@@ -1551,6 +1556,14 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
 
return retval;
 
+failed_execute_queue:
+   /* Put queue back to the list, then we have chance to destroy it.
+* FIXME: we do NOT want the queue in the runlist again.
+*/
+   list_add(>list, >queues_list);
+   qpd->queue_count++;
+   if (q->properties.is_active)
+   increment_queue_count(dqm, q->properties.type);
 failed_try_destroy_debugged_queue:
 
dqm_unlock(dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 09b98a83f670..984197e5929f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -607,11 +607,13 @@ static int kfd_procfs_add_sysfs_files(struct kfd_process 
*p)
 
 void kfd_procfs_del_queue(struct queue *q)
 {
-   if (!q)
+   if (!q || !kobject_get_unless_zero(>kobj))
return;
 
kobject_del(>kobj);
kobject_put(>kobj);
+   /* paired with the get above */
+   kobject_put(>kobj);
 }
 
 int kfd_process_create_wq(void)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 95a6c36cea4c..0588e552b8ec 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -373,6 +373,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
unsigned int qid)
dqm = pqn->kq->dev->dqm;
dqm->ops.destroy_kernel_queue(dqm, pqn->kq, >qpd);
kernel_queue_uninit(pqn->kq, false);
+   pqn->kq = NULL;
}
 
if (pqn->q) {
@@ -396,6 +397,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, 
unsigned int qid)
kfree(pqn->q->properties.cu_mask);
pqn->q->properties.cu_mask = NULL;
uninit_queue(pqn->q);
+   pqn->q = NULL;
}
 
list_del(>process_queue_list);
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3] drm/amd/amdgpu: Use IP discovery data to determine VCN enablement instead of MMSCH

2021-06-17 Thread Peng Ju Zhou
From: Bokun Zhang 

In the past, we use MMSCH to determine whether a VCN is enabled or not.
This is not reliable since after a FLR, MMSCH may report junk data.

It is better to use IP discovery data.

Signed-off-by: Bokun Zhang 
Signed-off-by: Peng Ju Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  8 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 23 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   | 13 +
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 53 +--
 5 files changed, 61 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index f949ed8bfd9e..e02405a24fe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -373,6 +373,14 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
return -EINVAL;
 }
 
+
+int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
+int *major, int *minor, int *revision)
+{
+   return amdgpu_discovery_get_ip_version(adev, VCN_HWID,
+  vcn_instance, major, minor, 
revision);
+}
+
 void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
 {
struct binary_header *bhdr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
index 02e340cd3a38..48e6b88cfdfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
@@ -32,6 +32,9 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev);
 void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev);
 int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int 
number_instance,
 int *major, int *minor, int *revision);
+
+int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
+int *major, int *minor, int *revision);
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev);
 
 #endif /* __AMDGPU_DISCOVERY__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 9492b505e69b..84b025405578 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -287,6 +287,29 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev)
return 0;
 }
 
+bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, enum vcn_ring_type 
type, uint32_t vcn_instance)
+{
+   bool ret = false;
+
+   int major;
+   int minor;
+   int revision;
+
+   /* if cannot find IP data, then this VCN does not exist */
+   if (amdgpu_discovery_get_vcn_version(adev, vcn_instance, , 
, ) != 0)
+   return true;
+
+   if ((type == VCN_ENCODE_RING) && (revision & 
VCN_BLOCK_ENCODE_DISABLE_MASK)) {
+   ret = true;
+   } else if ((type == VCN_DECODE_RING) && (revision & 
VCN_BLOCK_DECODE_DISABLE_MASK)) {
+   ret = true;
+   } else if ((type == VCN_UNIFIED_RING) && (revision & 
VCN_BLOCK_QUEUE_DISABLE_MASK)) {
+   ret = true;
+   }
+
+   return ret;
+}
+
 int amdgpu_vcn_suspend(struct amdgpu_device *adev)
 {
unsigned size;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index bc76cab67697..d74c62b49795 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -280,6 +280,16 @@ struct amdgpu_vcn_decode_buffer {
uint32_t pad[30];
 };
 
+#define VCN_BLOCK_ENCODE_DISABLE_MASK 0x80
+#define VCN_BLOCK_DECODE_DISABLE_MASK 0x40
+#define VCN_BLOCK_QUEUE_DISABLE_MASK 0xC0
+
+enum vcn_ring_type {
+   VCN_ENCODE_RING,
+   VCN_DECODE_RING,
+   VCN_UNIFIED_RING,
+};
+
 int amdgpu_vcn_sw_init(struct amdgpu_device *adev);
 int amdgpu_vcn_sw_fini(struct amdgpu_device *adev);
 int amdgpu_vcn_suspend(struct amdgpu_device *adev);
@@ -287,6 +297,9 @@ int amdgpu_vcn_resume(struct amdgpu_device *adev);
 void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring);
 void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring);
 
+bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev,
+   enum vcn_ring_type type, uint32_t vcn_instance);
+
 int amdgpu_vcn_dec_ring_test_ring(struct amdgpu_ring *ring);
 int amdgpu_vcn_dec_ring_test_ib(struct amdgpu_ring *ring, long timeout);
 int amdgpu_vcn_dec_sw_ring_test_ring(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 798b6b4d8f46..c3580de3ea9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -85,16 +85,18 @@ static void vcn_v3_0_enc_ring_set_wptr(struct amdgpu_ring 
*ring);
 

Re: [PATCH] drm/amdgpu: Call drm_framebuffer_init last for framebuffer init

2021-06-17 Thread Michel Dänzer
On 2021-06-16 12:46 p.m., Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> Once drm_framebuffer_init has returned 0, the framebuffer is hooked up
> to the reference counting machinery and can no longer be destroyed with
> a simple kfree. Therefore, it must be called last.
> 
> Fixes: f258907fdd835e "drm/amdgpu: Verify bo size can fit framebuffer size on 
> init."

In case the commit log wasn't clear: If drm_framebuffer_init returns 0 but its 
caller then returns non-0, there will likely be memory corruption fireworks 
down the road. The following lead me to this fix:

[   12.891228] kernel BUG at lib/list_debug.c:25!
[...]
[   12.891263] RIP: 0010:__list_add_valid+0x4b/0x70
[...]
[   12.891324] Call Trace:
[   12.891330]  drm_framebuffer_init+0xb5/0x100 [drm]
[   12.891378]  amdgpu_display_gem_fb_verify_and_init+0x47/0x120 [amdgpu]
[   12.891592]  ? amdgpu_display_user_framebuffer_create+0x10d/0x1f0 [amdgpu]
[   12.891794]  amdgpu_display_user_framebuffer_create+0x126/0x1f0 [amdgpu]
[   12.891995]  drm_internal_framebuffer_create+0x378/0x3f0 [drm]
[   12.892036]  ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm]
[   12.892075]  drm_mode_addfb2+0x34/0xd0 [drm]
[   12.892115]  ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm]
[   12.892153]  drm_ioctl_kernel+0xe2/0x150 [drm]
[   12.892193]  drm_ioctl+0x3da/0x460 [drm]
[   12.892232]  ? drm_internal_framebuffer_create+0x3f0/0x3f0 [drm]
[   12.892274]  amdgpu_drm_ioctl+0x43/0x80 [amdgpu]
[   12.892475]  __se_sys_ioctl+0x72/0xc0
[   12.892483]  do_syscall_64+0x33/0x40
[   12.892491]  entry_SYSCALL_64_after_hwframe+0x44/0xae



-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: unwrap fence chains in the explicit sync fence

2021-06-17 Thread Christian König

Alex do want to review those so that we can close the ticket?

Thanks,
Christian.

Am 14.06.21 um 19:45 schrieb Christian König:

Unwrap the explicit fence if it is a dma_fence_chain and
sync to the first fence not matching the owner rules.

Signed-off-by: Christian König 
Acked-by: Daniel Vetter 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 118 +--
  1 file changed, 68 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 1b2ceccaf5b0..862eb3c1c4c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -28,6 +28,8 @@
   *Christian König 
   */
  
+#include 

+
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  #include "amdgpu_amdkfd.h"
@@ -186,6 +188,55 @@ int amdgpu_sync_vm_fence(struct amdgpu_sync *sync, struct 
dma_fence *fence)
return amdgpu_sync_fence(sync, fence);
  }
  
+/* Determine based on the owner and mode if we should sync to a fence or not */

+static bool amdgpu_sync_test_fence(struct amdgpu_device *adev,
+  enum amdgpu_sync_mode mode,
+  void *owner, struct dma_fence *f)
+{
+   void *fence_owner = amdgpu_sync_get_owner(f);
+
+   /* Always sync to moves, no matter what */
+   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED)
+   return true;
+
+   /* We only want to trigger KFD eviction fences on
+* evict or move jobs. Skip KFD fences otherwise.
+*/
+   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+   return false;
+
+   /* Never sync to VM updates either. */
+   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
+   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+   return false;
+
+   /* Ignore fences depending on the sync mode */
+   switch (mode) {
+   case AMDGPU_SYNC_ALWAYS:
+   return true;
+
+   case AMDGPU_SYNC_NE_OWNER:
+   if (amdgpu_sync_same_dev(adev, f) &&
+   fence_owner == owner)
+   return false;
+   break;
+
+   case AMDGPU_SYNC_EQ_OWNER:
+   if (amdgpu_sync_same_dev(adev, f) &&
+   fence_owner != owner)
+   return false;
+   break;
+
+   case AMDGPU_SYNC_EXPLICIT:
+   return false;
+   }
+
+   WARN(debug_evictions && fence_owner == AMDGPU_FENCE_OWNER_KFD,
+"Adding eviction fence to sync obj");
+   return true;
+}
+
  /**
   * amdgpu_sync_resv - sync to a reservation object
   *
@@ -211,67 +262,34 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct 
amdgpu_sync *sync,
  
  	/* always sync to the exclusive fence */

f = dma_resv_excl_fence(resv);
-   r = amdgpu_sync_fence(sync, f);
+   dma_fence_chain_for_each(f, f) {
+   struct dma_fence_chain *chain = to_dma_fence_chain(f);
+
+   if (amdgpu_sync_test_fence(adev, mode, owner, chain ?
+  chain->fence : f)) {
+   r = amdgpu_sync_fence(sync, f);
+   dma_fence_put(f);
+   if (r)
+   return r;
+   break;
+   }
+   }
  
  	flist = dma_resv_shared_list(resv);

-   if (!flist || r)
-   return r;
+   if (!flist)
+   return 0;
  
  	for (i = 0; i < flist->shared_count; ++i) {

-   void *fence_owner;
-
f = rcu_dereference_protected(flist->shared[i],
  dma_resv_held(resv));
  
-		fence_owner = amdgpu_sync_get_owner(f);

-
-   /* Always sync to moves, no matter what */
-   if (fence_owner == AMDGPU_FENCE_OWNER_UNDEFINED) {
+   if (amdgpu_sync_test_fence(adev, mode, owner, f)) {
r = amdgpu_sync_fence(sync, f);
if (r)
-   break;
-   }
-
-   /* We only want to trigger KFD eviction fences on
-* evict or move jobs. Skip KFD fences otherwise.
-*/
-   if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
-   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-   continue;
-
-   /* Never sync to VM updates either. */
-   if (fence_owner == AMDGPU_FENCE_OWNER_VM &&
-   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
-   continue;
-
-   /* Ignore fences depending on the sync mode */
-   switch (mode) {
-   case AMDGPU_SYNC_ALWAYS:
-   break;
-
-   case AMDGPU_SYNC_NE_OWNER:
-   if (amdgpu_sync_same_dev(adev, f) &&
-   fence_owner == owner)
-  

[pull] amdgpu, amdkfd drm-next-5.14

2021-06-17 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.14.

The following changes since commit c707b73f0cfb1acc94a20389aecde65e6385349b:

  Merge tag 'amd-drm-next-5.14-2021-06-09' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2021-06-10 13:47:13 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.14-2021-06-16

for you to fetch changes up to a4b0b97aace09716a635e1a64c7e54e51f4a0f51:

  drm: display: Fix duplicate field initialization in dcn31 (2021-06-15 
17:25:42 -0400)


amd-drm-next-5.14-2021-06-16:

amdgpu:
- Aldebaran fixes
- Expose asic independent throttler status
- BACO fixes for navi1x
- Smartshift fixes
- Misc code cleanups
- RAS fixes for Sienna Cichlid
- Gamma verificaton fixes
- DC LTTPR fixes
- DP AUX timeout handling fixes
- GFX9, 10 powergating fixes

amdkfd:
- TLB flush fixes when using SDMA
- Locking fixes
- SVM fixes


Alex Sierra (1):
  drm/amdkfd: move CoherentHostAccess prop to HSA_CAPABILITY

Amber Lin (1):
  drm/amdkfd: Fix circular lock in nocpsch path

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.70

Aric Cyr (1):
  drm/amd/display: 3.2.140

Ashley Thomas (1):
  drm/amd/display: add DMUB registers to crash dump diagnostic data.

Aurabindo Pillai (1):
  drm/amd/display: add dummy PG callback for beige goby

David Galiffi (1):
  drm/amd/display: Updated variable name.

Dmytro Laktyushkin (1):
  drm/amd/display: Remove unnecessary blank lines

Eric Huang (1):
  drm/amdkfd: Add memory sync before TLB flush on unmap

Evan Quan (6):
  drm/amd/pm: drop the incomplete fix for Navi14 runpm issue
  drm/amd/pm: correct the runpm handling for BACO supported ASIC
  drm/amdgpu: make audio dev's D-state transition PMFW-aware
  drm/amd/pm: update the cached dpm feature status
  drm/amd/pm: correct the dpm features disablement for Navi1x
  drm/amd/pm: correct the power limits reporting on OOB supported

Felix Kuehling (2):
  drm/amdkfd: Disable SVM per GPU, not per process
  drm/amdgpu: Use spinlock_irqsave for pasid_lock

Graham Sider (9):
  drm/amd/pm: Add u64 throttler status field to gpu_metrics
  drm/amd/pm: Add ASIC independent throttle bits
  drm/amd/pm: Add common throttler translation func
  drm/amd/pm: Add arcturus throttler translation
  drm/amd/pm: Add navi1x throttler translation
  drm/amd/pm: Add sienna cichlid throttler translation
  drm/amd/pm: Add vangogh throttler translation
  drm/amd/pm: Add renoir throttler translation
  drm/amd/pm: Add aldebaran throttler translation

Guchun Chen (1):
  drm/amdgpu: use adev_to_drm macro for consistency (v2)

Hawking Zhang (9):
  drm/amdgpu: update psp gfx i/f to support dynamic GECC
  drm/amdgpu: allow different boot configs
  drm/amdgpu: add helper function to query gecc status in boot config
  drm/amdgpu: enable dynamic GECC support (v2)
  drm/amdgpu: add psp runtime db structures
  drm/amdgpu: add helper function to query psp runtime db entry (v2)
  drm/amdgpu: cache psp runtime boot_cfg_bitmask in sw_int
  drm/amdgpu: disable DRAM memory training when GECC is enabled
  drm/amdgpu: correct psp ucode arrary start address

Jiapeng Chong (2):
  drm/amd/display: Fix duplicate included clk_mgr.h
  drm/amd/display: use ARRAY_SIZE for base60_refresh_rates

John Clements (2):
  drm/amdgpu: Updated fw header structure source
  drm/amdgpu: Added support for loading auxiliary PSP FW

Jonathan Kim (1):
  drm/amdkfd: fix circular locking on get_wave_state

Josip Pavic (1):
  drm/amd/display: tune backlight ramping profiles

Lijo Lazar (1):
  drm/amd/pm: Only primary die supports power data

Mark Yacoub (1):
  drm/amd/display: Verify Gamma & Degamma LUT sizes in 
amdgpu_dm_atomic_check

Nirmoy Das (4):
  drm/amdkfd: use allowed domain for vmbo validation
  drm/amdgpu: remove amdgpu_vm_pt
  drm/amdgpu: parameterize ttm BO destroy callback
  drm/amdgpu: move shadow_list to amdgpu_bo_vm

Peng Ju Zhou (1):
  drm/amd/amdgpu: add instance_number check in 
amdgpu_discovery_get_ip_version

Po-Ting Chen (1):
  drm/amd/display: Change swizzle visual confirm reference pipe

Roman Li (1):
  drm/amd/display: move psr dm interface to separate files

Sathishkumar S (2):
  drm/amd/pm: support ss metrics read on renoir
  drm/amd/pm: support ss metrics read on yellow_carp

Wan Jiabing (3):
  drm: display: Remove duplicate include in dce110
  drm: display: Remove duplicated argument in dcn31
  drm: display: Fix duplicate field initialization in dcn31

Wenjing Liu (1):
  drm/amd/display: dp mst detection code refactor

Wesley Chalmers (14):
  drm/amd/display: Read LTTPR caps first on hotplug
  drm/amd/display: Move LTTPR cap read into its