Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-01 Thread Chen, Xiaogang
On 5/1/2024 5:56 PM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to

Re: [PATCH] rock-dgb_defconfig: Update for Linux 6.7 with UBSAN

2024-04-16 Thread Chen, Xiaogang
On 4/15/2024 2:49 PM, Felix Kuehling wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. make rock-dbg_defconfig make savedefconfig cp defconfig arch/x86/config/rock-dbg_defconfig This also enables

Re: [PATCH 1/2] amd/amdkfd: sync all devices to wait all processes being evicted

2024-04-02 Thread Chen, Xiaogang
On 4/1/2024 4:53 PM, Zhigang Luo wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. If there are more than one device doing reset in parallel, the first device will call kfd_suspend_all_processes() to

Re: [PATCH] drm/amdkfd: Cleanup workqueue during module unload

2024-03-26 Thread Chen, Xiaogang
On 3/20/2024 5:52 PM, Mukul Joshi wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Destroy the high priority workqueue that handles interrupts during KFD node cleanup. Signed-off-by: Mukul Joshi ---

Re: [PATCH 1/3] amd/amdkfd: add a function to wait no process running in kfd

2024-03-26 Thread Chen, Xiaogang
On 3/25/2024 10:18 AM, Zhigang Luo wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Signed-off-by: Zhigang Luo Change-Id: I2a98d513c26107ac76ecf20e951c188afbc7ede6 ---

Re: [PATCH 1/2] drm/amdkfd: Document and define SVM event tracing macro

2024-02-15 Thread Chen, Xiaogang
On 2/15/2024 9:18 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Document how to use SMI system management interface to receive SVM events. Define SVM events message string format

Re: [PATCH v4] drm/amdkfd: Set correct svm range actual loc after spliting

2024-01-15 Thread Chen, Xiaogang
With a nitpick below, this patch is: Reviewed-by:Xiaogang Chen On 1/15/2024 4:02 PM, Philip Yang wrote: While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new

Re: [PATCH] drm/amdkfd: Correct partial migration virtual addr

2024-01-15 Thread Chen, Xiaogang
This patch is: Reviewed-by Xiaogang Chen On 1/15/2024 4:00 PM, Philip Yang wrote: Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: 18eb61bd5a6a ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page

Re: [PATCH v3] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-11 Thread Chen, Xiaogang
On 1/11/2024 10:54 AM, Felix Kuehling wrote: On 2024-01-10 17:01, Philip Yang wrote: While svm range partial migrating to system memory, clear dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new range and old

Re: [PATCH v2] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-09 Thread Chen, Xiaogang
On 1/9/2024 2:05 PM, Philip Yang wrote: After svm range partial migrating to system memory, unmap to cleanup the corresponding dma_addr vram domain flag, otherwise the future split will get incorrect vram_pages and actual loc. After range spliting, set new range and old range actual_loc: new

Re: [PATCH] amd/amdkfd: Set correct svm range actual loc after spliting

2024-01-08 Thread Chen, Xiaogang
With a nitpick below, this patch is Reviewed-by:Xiaogang Chen On 1/8/2024 4:36 PM, Philip Yang wrote: After range spliting, set new range and old range actual_loc: new range actual_loc is 0 if new->vram_pages is 0. old range actual_loc is 0 if old->vram_pages - new->vram_pages == 0.

Re: [PATCH v5 2/2] drm/amdkfd: Bump KFD ioctl version

2024-01-08 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen On 1/3/2024 5:15 PM, Felix Kuehling wrote: This is not strictly a change in the IOCTL API. This version bump is meant to indicate to user mode the presence of a number of changes and fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl

Re: [PATCH v3] drm/amdkfd: Use partial hmm page walk during buffer validation in SVM

2023-12-11 Thread Chen, Xiaogang
On 12/11/2023 8:42 AM, Philip Yang wrote: On 2023-12-06 10:24, Xiaogang.Chen wrote: From: Xiaogang Chen v2: -not need calculate vram page number for new registered svm range, only do it for split vram pages. v3: -use dma address to calculate vram page number of split svm range; use

Re: [PATCH v2] drm/amdkfd: Use partial hmm page walk during buffer validation in SVM

2023-12-06 Thread Chen, Xiaogang
On 12/5/2023 12:48 PM, Philip Yang wrote: On 2023-12-04 15:23, Xiaogang.Chen wrote: From: Xiaogang Chen v2: -not need calculate vram page number for new registered svm range, only do it for split vram pages. SVM uses hmm page walk to valid buffer before map to gpu vm. After have partial

Re: [PATCH] drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM

2023-11-27 Thread Chen, Xiaogang
On 11/22/2023 2:12 PM, Felix Kuehling wrote: On 2023-11-14 16:01, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration/mapping for gpu/cpu page faults in SVM according to migration granularity(default 2MB). A svm range may include pages from both system ram

Re: [PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-08 Thread Chen, Xiaogang
On 11/8/2023 5:26 PM, Felix Kuehling wrote: On 2023-11-08 18:20, Chen, Xiaogang wrote: On 11/7/2023 10:58 AM, Felix Kuehling wrote: Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and rem

Re: [PATCH 5/6] drm/amdkfd: Import DMABufs for interop through DRM

2023-11-08 Thread Chen, Xiaogang
On 11/7/2023 10:58 AM, Felix Kuehling wrote: Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling ---

Re: [PATCH v3] drm/amdkfd: Use partial mapping in GPU page faults

2023-10-29 Thread Chen, Xiaogang
On 10/23/2023 6:08 PM, Felix Kuehling wrote: On 2023-10-20 17:53, Xiaogang.Chen wrote: From: Xiaogang Chen After partial migration to recover GPU page fault this patch does GPU vm space mapping for same page range that got migrated intead of mapping all pages of svm range in which the

Re: [PATCH] drm/amdkfd: Use partial mapping in GPU page fault recovery

2023-10-19 Thread Chen, Xiaogang
On 10/19/2023 2:40 PM, Philip Yang wrote: On 2023-10-19 12:20, Chen, Xiaogang wrote: On 10/19/2023 11:08 AM, Philip Yang wrote: On 2023-10-19 10:24, Xiaogang.Chen wrote: From: Xiaogang Chen After partial migration to recover GPU page fault this patch does GPU vm space mapping

Re: [PATCH] drm/amdkfd: Use partial mapping in GPU page fault recovery

2023-10-19 Thread Chen, Xiaogang
On 10/19/2023 11:08 AM, Philip Yang wrote: On 2023-10-19 10:24, Xiaogang.Chen wrote: From: Xiaogang Chen After partial migration to recover GPU page fault this patch does GPU vm space mapping for same page range that got migrated instead of mapping all pages of svm range in which the page

Re: [PATCH] drm/amdkfd: Fix shift out-of-bounds issue

2023-10-18 Thread Chen, Xiaogang
On 10/18/2023 9:53 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. The 255 granularity is from recent Thunk change to increase CWSR area granularity. I think we also

Re: [PATCH v2] drm/amdgpu: Correctly use bo_va->ref_count in compute VMs

2023-10-12 Thread Chen, Xiaogang
On 10/12/2023 12:48 PM, Felix Kuehling wrote: On 2023-10-12 12:34, Xiaogang.Chen wrote: From: Xiaogang Chen This is needed to correctly handle BOs imported into compute VM from gfx. Both kfd and gfx should use same bo_va and set bo_va->ref_count correctly when map the Bos into same VM,

Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Chen, Xiaogang
On 10/12/2023 10:20 AM, Felix Kuehling wrote: On 2023-10-11 19:36, Xiaogang.Chen wrote: From: Xiaogang Chen Since you are the author of this updated patch, you should also add your Signed-off-by below. ok, thanks. This is needed to correctly handle BOs imported into compute VM from

Re: [PATCH] Find bo_va before create it when map bo into compute VM

2023-10-12 Thread Chen, Xiaogang
On 10/12/2023 2:35 AM, Christian König wrote: The subject line somehow got messed up. There should be an drm/amdgpu: or drm/amdkfd: prefix. yes, will resend it. Regards Xiaogang. Regards, Christian. Am 12.10.23 um 01:36 schrieb Xiaogang.Chen: From: Xiaogang Chen This is needed to

Re: [PATCH v2 5/7] drm/amdkfd: Check bitmap_mapped flag to skip retry fault

2023-10-11 Thread Chen, Xiaogang
On 10/10/2023 9:40 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Remove prange validate_timestamp which is not accurate for multiple GPUs. Use the bitmap_mapped flag to skip the

Re: [PATCH v2 3/7] amd/amdkfd: Add granularity bitmap mapped to gpu flag

2023-10-11 Thread Chen, Xiaogang
On 10/10/2023 9:40 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Replace prange->mapped_to_gpu with prange->bitmap_mapped[], which is per GPU flag and based on prange granularity,

Re: [PATCH v2 1/7] drm/amdkfd: Wait vm update fence after retry fault recovered

2023-10-11 Thread Chen, Xiaogang
On 10/10/2023 9:40 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. If using sdma update GPU page table, kfd flush tlb does nothing if vm update fence callback doesn't update

Re: [PATCH v4] drm/amdkfd: Use partial migrations in GPU page faults

2023-10-05 Thread Chen, Xiaogang
On 10/5/2023 8:25 AM, Philip Yang wrote: Sorry for the late reply, just notice 2 other issues: 1. function svm_range_split_by_granularity can be removed now. yes, the code has been sent to gerrit and merged. Will do it next time. 2. svm_range_restore_pages should map partial range to

Re: [PATCH v4] drm/amdkfd: Use partial migrations in GPU page faults

2023-10-04 Thread Chen, Xiaogang
On 10/4/2023 1:47 PM, Felix Kuehling wrote: On 2023-10-03 19:31, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include

Re: [PATCH 3/3] drm/amdkfd: Check bitmap_mapped flag to skip retry fault

2023-10-02 Thread Chen, Xiaogang
On 9/29/2023 9:11 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Use bitmap_mapped flag to check if range already mapped to the specific GPU, to skip the retry fault from different

Re: [PATCH 2/3] amd/amdkfd: Unmap range from GPUs based on granularity

2023-10-02 Thread Chen, Xiaogang
On 9/29/2023 9:11 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Align unmap range start and last address to granularity boundary. Skip unmap if range is already unmapped from GPUs.

Re: [PATCH 1/3] amd/amdkfd: Add granularity bitmap mapped to gpu flag

2023-10-02 Thread Chen, Xiaogang
On 9/29/2023 9:11 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Replace prange->mapped_to_gpu with prange->bitmap_mapped[], which is based on prange granularity, updated when map

Re: [PATCH v3] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-29 Thread Chen, Xiaogang
On 9/29/2023 2:48 PM, Felix Kuehling wrote: On 2023-09-29 15:44, Chen, Xiaogang wrote: On 9/29/2023 1:41 PM, Felix Kuehling wrote: On 2023-09-29 14:25, Chen, Xiaogang wrote: On 9/28/2023 4:26 PM, Felix Kuehling wrote: On 2023-09-20 13:32, Xiaogang.Chen wrote: From: Xiaogang Chen

Re: [PATCH v3] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-29 Thread Chen, Xiaogang
On 9/29/2023 1:41 PM, Felix Kuehling wrote: On 2023-09-29 14:25, Chen, Xiaogang wrote: On 9/28/2023 4:26 PM, Felix Kuehling wrote: On 2023-09-20 13:32, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity

Re: [PATCH v3] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-29 Thread Chen, Xiaogang
On 9/28/2023 4:26 PM, Felix Kuehling wrote: On 2023-09-20 13:32, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include

Re: [PATCH v3] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-27 Thread Chen, Xiaogang
ping for review. On 9/20/2023 12:32 PM, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include pages from both system ram and

Re: [PATCH] drm/amdkfd: Wait vm update fence after retry fault recovered

2023-09-27 Thread Chen, Xiaogang
On 9/27/2023 9:29 AM, Philip Yang wrote: On 2023-09-26 16:43, Chen, Xiaogang wrote: On 9/22/2023 4:37 PM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Otherwise kfd flush tlb

Re: [PATCH] drm/amdkfd: Fix a race condition of vram buffer unref in svm code

2023-09-27 Thread Chen, Xiaogang
On 9/27/2023 9:19 AM, Eric Huang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. On 2023-09-26 23:00, Xiaogang.Chen wrote: From: Xiaogang Chen prange->svm_bo unref can happen in both mmu callback

Re: [PATCH] drm/amdkfd: Wait vm update fence after retry fault recovered

2023-09-26 Thread Chen, Xiaogang
On 9/22/2023 4:37 PM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Otherwise kfd flush tlb does nothing if vm update fence callback doesn't update vm->tlb_seq. H/W will generate retry

Re: [PATCH] drm/amdkfd: Fix the svm_bo refcount waring

2023-09-25 Thread Chen, Xiaogang
On 9/25/2023 8:19 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. On 2023-09-25 05:32, Jesse Zhang wrote: Fix the svm_bo refcount warnging by check the refcount before

Re: [PATCH] drm/amdkfd: fix some race conditions in vram buffer alloc/free of svm code

2023-09-20 Thread Chen, Xiaogang
On 9/20/2023 9:55 AM, Felix Kuehling wrote: On 2023-09-20 2:17, Xiaogang.Chen wrote: From: Xiaogang Chen This patch fixes: 1: ref number of prange's svm_bo got decreased by an async call from hmm. When wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL,

Re: [PATCH v2] drm/amdkfd: handle errors from svm validate and map

2023-09-15 Thread Chen, Xiaogang
On 9/15/2023 4:20 PM, Philip Yang wrote: On 2023-09-15 17:06, Chen, Xiaogang wrote: On 9/15/2023 8:28 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. If new range is splited

Re: [PATCH v2] drm/amdkfd: handle errors from svm validate and map

2023-09-15 Thread Chen, Xiaogang
On 9/15/2023 8:28 AM, Philip Yang wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. If new range is splited to multiple pranges with max_svm_range_pages alignment and added to update_list, svm

Re: [PATCH v2] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-14 Thread Chen, Xiaogang
On 9/13/2023 5:03 PM, Felix Kuehling wrote: On 2023-09-11 10:04, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-09-05 Thread Chen, Xiaogang
On 9/5/2023 9:02 AM, Philip Yang wrote: On 2023-08-31 17:29, Chen, Xiaogang wrote: On 8/31/2023 3:59 PM, Felix Kuehling wrote: On 2023-08-31 16:33, Chen, Xiaogang wrote: That said, I'm not actually sure why we're freeing the DMA address array after migration to RAM at all. I think we

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-08-31 Thread Chen, Xiaogang
On 8/31/2023 3:59 PM, Felix Kuehling wrote: On 2023-08-31 16:33, Chen, Xiaogang wrote: That said, I'm not actually sure why we're freeing the DMA address array after migration to RAM at all. I think we still need it even when we're using VRAM. We call svm_range_dma_map

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-08-31 Thread Chen, Xiaogang
On 8/31/2023 1:00 PM, Felix Kuehling wrote: On 2023-08-30 19:02, Chen, Xiaogang wrote: On 8/30/2023 3:56 PM, Felix Kuehling wrote: On 2023-08-30 15:39, Chen, Xiaogang wrote: On 8/28/2023 5:37 PM, Felix Kuehling wrote: On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-08-30 Thread Chen, Xiaogang
On 8/30/2023 3:56 PM, Felix Kuehling wrote: On 2023-08-30 15:39, Chen, Xiaogang wrote: On 8/28/2023 5:37 PM, Felix Kuehling wrote: On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-08-30 Thread Chen, Xiaogang
On 8/28/2023 5:37 PM, Felix Kuehling wrote: On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity

Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

2023-08-28 Thread Chen, Xiaogang
On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. Now a svm range may

Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU

2023-08-11 Thread Chen, Xiaogang
calls to cond_resched(). But then I would expect cond_resched() to fix the problem, according to this document. Regards,   Felix On 2023-08-11 17:27, Chen, Xiaogang wrote: On 8/11/2023 4:22 PM, Felix Kuehling wrote: On 2023-08-11 17:12, Chen, Xiaogang wrote: I know the original jira ticket. The syst

Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU

2023-08-11 Thread Chen, Xiaogang
On 8/11/2023 4:22 PM, Felix Kuehling wrote: On 2023-08-11 17:12, Chen, Xiaogang wrote: I know the original jira ticket. The system got RCU cpu stall, then kernel enter panic, then no response or ssh. This patch let prange list update task yield cpu after each range update. It can prevent

Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU

2023-08-11 Thread Chen, Xiaogang
I know the original jira ticket. The system got RCU cpu stall, then kernel enter panic, then no response or ssh. This patch let prange list update task yield cpu after each range update. It can prevent task holding mm lock too long. mm lock is rw_semophore, not RCU mechanism. Can you

Re: [PATCH] drm/amdgpu: Add a low priority scheduler for VRAM clearing

2023-05-18 Thread Chen, Xiaogang
On 5/17/2023 5:10 PM, Felix Kuehling wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. On 2023-05-17 17:40, Mukul Joshi wrote: Add a low priority DRM scheduler for VRAM clearing instead of using the

Re: [PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

2023-04-19 Thread Chen, Xiaogang
On 4/18/2023 6:17 PM, Felix Kuehling wrote: On 2023-04-13 23:27, Chen, Xiaogang wrote: On 4/13/2023 3:08 PM, Felix Kuehling wrote: Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen: From: Xiaogang Chen Notice userptr buffer restore process has following issues: 1

Re: [PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

2023-04-13 Thread Chen, Xiaogang
On 4/13/2023 3:08 PM, Felix Kuehling wrote: Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen: From: Xiaogang Chen Notice userptr buffer restore process has following issues: 1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we should not set it valid(mem->invalid = 0). In this

Re: [PATCH] drm/amdkfd: Change WARN to pr_debug when same userptr BOs got invalidated by mmu.

2023-04-11 Thread Chen, Xiaogang
On 4/10/2023 2:58 PM, Felix Kuehling wrote: On 2023-04-10 10:36, Xiaogang.Chen wrote: From: Xiaogang Chen During KFD restore evicted userptr BOs mmu invalidate callback may invalidate same userptr BOs that have been just restored. When KFD restore process detects it KFD will reschedule

Re: [pull] amdgpu, amdkfd drm-fixes-6.3

2023-03-09 Thread Chen, Xiaogang
On 3/9/2023 11:32 AM, Alex Deucher wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. On Thu, Mar 9, 2023 at 12:16 PM Felix Kuehling wrote: Am 2023-03-08 um 23:38 schrieb Alex Deucher: Hi Dave,

Re: [PATCH] drm/amdkfd: Get prange->offset after svm_range_vram_node_new

2023-03-08 Thread Chen, Xiaogang
On 3/8/2023 11:11 AM, Felix Kuehling wrote: On 2023-03-08 02:45, Xiaogang.Chen wrote: From: Xiaogang Chen During miration to vram prange->offset is valid after vram buffer is located, either use old one or allocate a new one. Move svm_range_vram_node_new before migrate for each vma to

Re: [PATCH v2] drm/amdkfd: Cal vram offset in TTM resource for each svm_migrate_copy_to_vram

2023-03-01 Thread Chen, Xiaogang
On 3/1/2023 12:54 PM, Felix Kuehling wrote: Am 2023-03-01 um 11:34 schrieb Xiaogang.Chen: From: Xiaogang Chen svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may cross multiple vma. Need remember current dst vram offset in the TTM resource for each migration.

Re: [PATCH 2/2] drm/amdgpu: Synchronize after mapping into a compute VM

2023-02-26 Thread Chen, Xiaogang
On 2/24/2023 5:36 PM, Felix Kuehling wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Compute VMs use user mode queues for command submission. They cannot use a CS ioctl to synchronize with pending

Re: [PATCH] drm/amdkfd: Prevent user space using both svm and kfd api to register same user buffer

2023-02-07 Thread Chen, Xiaogang
On 2/7/2023 2:48 PM, Felix Kuehling wrote: Am 2023-02-07 um 15:35 schrieb Xiaogang.Chen: From: Xiaogang Chen When xnack is on user space can use svm page restore to set a vm range without setup it first, then use regular api to register. Currently kfd api and svm are not interoperable.

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-13 Thread Chen, Xiaogang
On 1/13/2023 4:26 PM, Felix Kuehling wrote: On 2023-01-12 17:41, Chen, Xiaogang wrote: On 1/11/2023 7:31 PM, Felix Kuehling wrote: Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs

Re: [PATCH 6/6] drm/amdgpu: Do bo_va ref counting for KFD BOs

2023-01-13 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen Regards Xiaogang On 1/11/2023 7:31 PM, Felix Kuehling wrote: This is needed to correctly handle BOs imported into the GEM API, which would otherwise get added twice to the same VM. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Re: [PATCH 5/6] drm/amdgpu: update mappings not managed by KFD

2023-01-13 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen Regards Xiaogang On 1/11/2023 7:31 PM, Felix Kuehling wrote: When restoring after an eviction, use amdgpu_vm_handle_moved to update BO VA mappings in KFD VMs that are not managed through the KFD API. This should allow using the render node API to create more

Re: [PATCH 4/6] drm/amdgpu: Attach eviction fence on alloc

2023-01-13 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen Regards Xiaogang On 1/11/2023 7:31 PM, Felix Kuehling wrote: Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API.

Re: [PATCH 3/6] drm/amdkfd: Improve amdgpu_vm_handle_moved

2023-01-13 Thread Chen, Xiaogang
Acked-by: Xiaogang Chen Regards Xiaogang On 1/11/2023 7:31 PM, Felix Kuehling wrote: Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API.

Re: [PATCH 2/6] drm/amdkfd: Implement DMA buf fd export from KFD

2023-01-13 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen Regards Xiaogang On 1/11/2023 7:31 PM, Felix Kuehling wrote: Exports a DMA buf fd of a given KFD buffer handle. This is intended for being able to import KFD BOs into GEM contexts to leverage the amdgpu_bo_va API for more flexible virtual address mappings. It

Re: [PATCH 1/6] drm/amdgpu: Generalize KFD dmabuf import

2023-01-12 Thread Chen, Xiaogang
On 1/11/2023 7:31 PM, Felix Kuehling wrote: Use proper amdgpu_gem_prime_import function to handle all kinds of imports. Remember the dmabuf reference to enable proper multi-GPU attachment to multiple VMs without erroneously re-exporting the underlying BO multiple times. Signed-off-by: Felix

Re: [PATCH] drm/amdgpu: Add notifier lock for KFD userptrs

2022-12-08 Thread Chen, Xiaogang
On 12/5/2022 6:54 PM, Felix Kuehling wrote: Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. v2: rebased Signed-off-by: Felix Kuehling Reviewed-by: Xiaogang Chen (v1) This patch is

RE: [PATCH] drm/amdgpu: Add notifier lock for KFD userptrs

2022-11-17 Thread Chen, Xiaogang
[AMD Official Use Only - General] This patch is: Reviewed-by: Xiaogang Chen -Original Message- From: amd-gfx On Behalf Of Felix Kuehling Sent: Wednesday, November 2, 2022 9:00 PM To: amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/amdgpu: Add notifier lock for KFD userptrs Caution:

RE: [PATCH] drm/amdkfd: Handle restart of kfd_ioctl_wait_events

2022-08-09 Thread Chen, Xiaogang
[AMD Official Use Only - General] This patch is: Reviewed-and-Tested-by: Xiaogang Chen -Original Message- From: Kuehling, Felix Sent: Thursday, August 4, 2022 5:29 PM To: amd-gfx@lists.freedesktop.org Cc: Chen, Xiaogang ; Curtis, Nicholas Subject: [PATCH] drm/amdkfd: Handle restart

Re: [PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys

2021-12-10 Thread Chen, Xiaogang
On 12/10/2021 10:49 AM, Felix Kuehling wrote: On 2021-12-10 2:22 a.m., Christian König wrote: Am 09.12.21 um 23:27 schrieb Felix Kuehling: Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang: On 12/9/2021 12:40 PM, Felix Kuehling wrote: Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen: From

Re: [PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys

2021-12-09 Thread Chen, Xiaogang
On 12/9/2021 12:40 PM, Felix Kuehling wrote: Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen: From: Xiaogang Chen When application is about finish it destroys queues it has created by an ioctl. Driver deletes queue entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/) which is directory

Re: [PATCH 2/2] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-01-22 Thread Chen, Xiaogang
On 1/19/2021 4:29 PM, Grodzovsky, Andrey wrote: On 1/15/21 2:21 AM, Chen, Xiaogang wrote: On 1/14/2021 1:24 AM, Grodzovsky, Andrey wrote: On 1/14/21 12:11 AM, Chen, Xiaogang wrote: On 1/12/2021 10:54 PM, Grodzovsky, Andrey wrote: On 1/4/21 1:01 AM, Xiaogang.Chen wrote: From: Xiaogang Chen

Re: [PATCH 2/2] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-01-14 Thread Chen, Xiaogang
On 1/14/2021 1:24 AM, Grodzovsky, Andrey wrote: > > > On 1/14/21 12:11 AM, Chen, Xiaogang wrote: >> On 1/12/2021 10:54 PM, Grodzovsky, Andrey wrote: >>> >>> On 1/4/21 1:01 AM, Xiaogang.Chen wrote: >>>> From: Xiaogang Chen >>>> >>>&

Re: [PATCH 2/2] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-01-13 Thread Chen, Xiaogang
On 1/12/2021 10:54 PM, Grodzovsky, Andrey wrote: On 1/4/21 1:01 AM, Xiaogang.Chen wrote: From: Xiaogang Chen amdgpu DM handles INTERRUPT_LOW_IRQ_CONTEXT interrupt(hpd, hpd_rx) by using work queue and uses single work_struct. If previous interrupt has not been

RE: [PATCH 2/2] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-01-12 Thread Chen, Xiaogang
[AMD Official Use Only - Internal Distribution Only] Would you give review? Thanks Xiaogang PS: Remove drm mailing list as this patch addresses amd display specific. -Original Message- From: Chen, Xiaogang Sent: Tuesday, January 12, 2021 12:38 AM To: amd-gfx@lists.freedesktop.org

RE: [PATCH 2/2] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-01-11 Thread Chen, Xiaogang
[AMD Official Use Only - Internal Distribution Only] Any comment? -Original Message- From: Xiaogang.Chen Sent: Monday, January 4, 2021 12:02 AM To: amd-gfx@lists.freedesktop.org; Wentland, Harry ; dri-de...@lists.freedesktop.org; airl...@linux.ie Cc: Chen, Xiaogang Subject: [PATCH 2

RE: [PATCH 1/2] drm: distinguish return value of drm_dp_check_and_send_link_address.

2021-01-11 Thread Chen, Xiaogang
[AMD Official Use Only - Internal Distribution Only] Any comment? -Original Message- From: Xiaogang.Chen Sent: Monday, January 4, 2021 12:02 AM To: amd-gfx@lists.freedesktop.org; Wentland, Harry ; dri-de...@lists.freedesktop.org; airl...@linux.ie Cc: Chen, Xiaogang Subject: [PATCH 1