Fix a circular lock dependency exposed under userptr memory pressure.
The DQM lock is the only one taken inside the MMU notifier. We need
to make sure that no reclaim is done under this lock, and that
no other locks are taken under which reclaim is possible.
Signed-off-by: Felix Kuehling
From: Yong Zhao
This avoids duplicated code.
Signed-off-by: Yong Zhao
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 29 +++
1 file changed, 11 insertions(+), 18 deletions(-)
diff --git
From: Jay Cornwall
ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values.
These values are more useful to the debugger than ttmp[14:15], which
carries dispatch_scratch_base*. There are too few registers to
preserve both.
Signed-off-by: Jay Cornwall
Reviewed-by: Felix Kuehling
From: shaoyunl
There is a bug found in vml2 xgmi logic:
mtype is always sent as NC on the VMC to TC interface for a page walk,
regardless of whether the request is being sent to local or remote GPU.
NC means non-coherent and will cause the VMC return data to be cached
in the TCC (versus UC –
From: Oak Zeng
Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues.
Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized
for non-PCIe transfer such as XGMI.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
From: Oak Zeng
MEC FW for some new asic requires all SDMA MQDs to be in a continuous
trunk of memory right after HIQ MQD. Add a field in device queue manager
to hold the HIQ/SDMA MQD memory object and allocate MQD trunk on device
queue manager initialization.
Signed-off-by: Oak Zeng
From: Jay Cornwall
If instruction fetch fails the wave cannot be halted and returned to
the shader without raising MEM_VIOL again. Currently the wave is
terminated if this occurs, but this loses information about the cause
of the fault. The debugger would prefer the faulting wave state to be
From: Oak Zeng
Instead of allocat hiq and sdma mqd from sub-allocator, allocate
them from a mqd trunk pool. This is done for all asics
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 49 +++
From: Oak Zeng
sdma_queue_id is sdma queue index inside one sdma engine.
sdma_id is sdma queue index among all sdma engines. Use
those two names properly.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
From: Oak Zeng
Maximumly support 64 sdma queues
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +-
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
2 files changed, 6
From: Oak Zeng
With introduction of new mqd allocation scheme for HIQ,
DIQ and HIQ use different mqd allocation scheme, DIQ
can't reuse HIQ mqd manager
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
From: Oak Zeng
Free mqd_mem_obj it GTT buffer allocation for MQD+control stack fails.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git
From: Oak Zeng
Add debug messages during SDMA queue allocation.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +++
1 file changed, 3 insertions(+)
diff --git
From: Kent Russell
GTT size is currently limited to the minimum of VRAM size or 3/4 of
system memory. This severely limits the quanitity of system memory
that can be used by ROCm application.
Increase GTT size to the maximum of VRAM size or system memory size.
Signed-off-by: Kent Russell
From: Amber Lin
A multi-socket server can have multiple PCIe segments so BFD is not enough
to distingush each GPU. Also add domain number into account when generating
gpu_id.
Signed-off-by: Amber Lin
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
From: Oak Zeng
Alloc format was never really supported by MEC FW. FW always
does one per pipe allocation.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 2 --
From: Oak Zeng
This is preparation work to introduce more mqd allocation
scheme
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
.../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 20 ++--
.../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 51
Assorted KFD changes that have been accumulating on amd-kfd-staging. New
features and fixes included:
* Support for VegaM
* Support for systems with multiple PCI domains
* New SDMA queue type that's optimized for XGMI links
* SDMA MQD allocation changes to support future ASICs with more SDMA
From: Oak Zeng
FW of some new ASICs requires sdma mqd size to be not more than
128 dwords. Repurpose the last 2 reserved fields of sdma mqd for
driver internal use, so the total mqd size is no bigger than 128
dwords
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix
From: Oak Zeng
Also initialize mqd size on mqd manager initialization
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4
From: Kent Russell
Add the VegaM information to KFD
Signed-off-by: Kent Russell
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 5 +
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 +++
From: Oak Zeng
Global function mqd_manager_init just calls asic-specific functions and it
is not necessary. Delete it and introduce a mqd_manager_init interface in
dqm for asic-specific mqd manager init. Call mqd_manager_init interface
directly to initialize mqd manager
Signed-off-by: Oak Zeng
From: Oak Zeng
Expose available numbers of both SDMA queue types in the topology.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 +++
drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 2 ++
2 files changed, 9
From: Oak Zeng
Previously mqd managers was initialized on demand. As there
are only a few type of mqd managers, the on demand initialization
doesn't save too much memory. Initialize them on device
queue initialization instead and delete the get_mqd_manager
interface. This makes codes more
From: Jay Cornwall
SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate
when saving in the high bits of TTMP1.
Signed-off-by: Jay Cornwall
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 12 ++--
From: Oak Zeng
Previous codes assumes there are two sdma engines.
This is not true e.g., Raven only has 1 SDMA engine.
Fix the issue by using sdma engine number info in
device_info.
Signed-off-by: Oak Zeng
Reviewed-by: Felix Kuehling
Signed-off-by: Felix Kuehling
---
From: Harish Kasiviswanathan
Fix compute profile switching on process termination.
Add a dedicated reference counter to keep track of entry/exit to/from
compute profile. This enables switching compute profiles for other
reasons than process creation or termination.
Signed-off-by: Harish
One more nit-pick inline.
On 2019-04-23 4:59 p.m., Zeng, Oak wrote:
> Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
> to an empty page in mmio space. We will later map this page to process
> space so application can flush hdp. This can't be done properly at
> those
It seems to me that amdgpu_hive_info is a driver-internal structure, but
the psp_xpmi_topology structures are an interface with the PSP that may
change in future ASIC generations. So on second thought, adding the
psp_xgmi_topology structures to the psp_xgmi_context (or
amdgpu_hive_info) like
See inline.
On 2019-04-23 3:23 p.m., Zeng, Oak wrote:
> Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
> to an empty page in mmio space. We will later map this page to process
> space so application can flush hdp. This can't be done properly at
> those registers' original
On 2019-04-17 2:59 p.m., Liu, Shaoyun wrote:
> Upper level runtime need the xgmi hops info to determine the data path
>
> Change-Id: I969b419eab125157e223e9b03980ca229c1e6af4
> Signed-off-by: shaoyunl
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 8 ++--
>
See inline.
On 2019-04-17 2:58 p.m., Liu, Shaoyun wrote:
> KFD need to provide the info for upper level to determine the data path
>
> Change-Id: Idc809e8f3381b9222dd7be96539522d440f3ee7d
> Signed-off-by: shaoyunl
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +++
>
Adding dri-devel
On 2019-04-17 6:15 p.m., Yang, Philip wrote:
> After patch "drm: Use the same mmap-range offset and size for GEM and
> TTM", application failed to create bo of system memory because drm
> mmap_range size decrease to 64GB from original 1TB. This is not big
> enough for
>
> On 2019-04-17 5:06 p.m., Kuehling, Felix wrote:
>> On 2019-04-17 4:54 p.m., Zhao, Yong wrote:
>>> The packet manager is only needed for HWS mode, as well as Hawaii in non
>>> HWS mode. So only initialize it under those scenarios. This is useful
>>> especially
On 2019-04-17 10:20 a.m., Zeng, Oak wrote:
> Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
> to an empty page in mmio space. We will later map this page to process
> space so application can flush hdp. This can't be done properly at
> those registers' original location
On 2019-04-17 12:20 p.m., Deucher, Alexander wrote:
>> -Original Message-
>> From: amd-gfx On Behalf Of
>> Zeng, Oak
>> Sent: Wednesday, April 17, 2019 10:21 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander ; Kuehling, Felix
>
On 2019-04-17 4:54 p.m., Zhao, Yong wrote:
> The packet manager is only needed for HWS mode, as well as Hawaii in non
> HWS mode. So only initialize it under those scenarios. This is useful
> especially for emulation environment when things are slow.
I never thought of packet manager
nel code to remove here. All I was saying was, that it's not a high
priority to add the kernel code to populate CPU cache information in
kernel mode.
Regards,
Felix
>
> Regards,
> Christian.
>
> Am 16.04.19 um 05:24 schrieb Kuehling, Felix:
>>
>> On x86
This is a nice cleanup.
With this change, kfd2kgd_calls.get_fw_version is no longer used. You
should remove it from kgd_kfd_interface.h. Also move the enum
kgd_engine_type to amdgpu_amdkfd.h at the same time.
With that fixed, this patch is Reviewed-by: Felix Kuehling
On 2019-04-12 4:10
On x86 we use the apicid to associate caches with CPU cores. See the Thunk code
in libhsakmt/src/topology.c (static void find_cpu_cache_siblings()). If we used
a different way to identify CPU cores, I think that would break. This code in
the Thunk is x86-specific as it uses the CPUID
How does forcing DPM levels work in SRIOV? Can clocks switch fast enough to
allow different VFs have different clocks? If not, can one VF override the
clocks used by another VF? In that case, wouldn't that violate the isolation
between VFs?
Regards,
Felix
-Original Message-
From:
On 2019-04-03 1:24 p.m., Koenig, Christian wrote:
> Am 01.04.19 um 20:58 schrieb Kuehling, Felix:
>> On 2019-04-01 2:03 p.m., Christian König wrote:
>>> Am 01.04.19 um 19:59 schrieb Kuehling, Felix:
>>>> On 2019-04-01 7:23 a.m., Christian König wrote:
>>>&
On 2019-04-02 10:37 a.m., Andrey Konovalov wrote:
> On Mon, Mar 25, 2019 at 11:21 PM Kuehling, Felix
> wrote:
>> On 2019-03-20 10:51 a.m., Andrey Konovalov wrote:
>>> This patch is a part of a series that extends arm64 kernel ABI to allow to
>>> pass tagged user p
On 2019-04-02 10:29 a.m., Paul E. McKenney wrote:
> Having DEFINE_SRCU() or DEFINE_STATIC_SRCU() in a loadable module
> requires that the size of the reserved region be increased, which is
> not something we really want to be doing. This commit therefore removes
> the DEFINE_STATIC_SRCU() from
On 2019-04-01 2:03 p.m., Christian König wrote:
> Am 01.04.19 um 19:59 schrieb Kuehling, Felix:
>> On 2019-04-01 7:23 a.m., Christian König wrote:
>>> Am 30.03.19 um 01:41 schrieb Kuehling, Felix:
>>>> Patches 1-3 are Reviewed-by: Felix Kuehling
>>>
On 2019-04-01 7:23 a.m., Christian König wrote:
> Am 30.03.19 um 01:41 schrieb Kuehling, Felix:
>> Patches 1-3 are Reviewed-by: Felix Kuehling
>
> Thanks.
>
>>
>> About the direct mode, that removes a bunch of synchronization, so it
>> must make some assumption
Patches 1-3 are Reviewed-by: Felix Kuehling
About the direct mode, that removes a bunch of synchronization, so it
must make some assumptions about the state of the page tables. What
makes that safe? Is it safe to use direct-mode on a
per-page-table-update basis? Or do all page table updates
On 2019-03-28 4:38 p.m., Liu, Shaoyun wrote:
> Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for
> peer device
>
> Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2
> Signed-off-by: shaoyunl
This patch is Reviewed-by: Felix Kuehling
Please also give Christian a
nly makes sense inside the loop. The amdgpu_vm_bo_base
should tell you the device that's mapping and potentially accessing the
memory over XGMI. You could get it like this:
mapping_adev = base->vm->root.base.bo->tbo.bdev;
Regards,
Felix
>
> Regards
>
> shaoyun.liu
The change looks reasonable to me. Acked-by: Felix Kuehling
I just don't understand why the root PD is special and handled
differently from other PDs and PTs.
Regards,
Felix
On 2019-03-27 6:39 a.m., Christian König wrote:
> Instead of skipping the root PD while processing the relocated
On 2019-03-28 1:55 p.m., Liu, Shaoyun wrote:
> Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for
> peer device
>
> Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2
> Signed-off-by: shaoyunl
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 13 +
>
On 2019-03-26 4:35 p.m., Liu, Shaoyun wrote:
> Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for
> peer device
>
> Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2
> Signed-off-by: shaoyunl
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +++
>
On 2019-03-26 2:54 p.m., Liu, Shaoyun wrote:
> Avoid unnecessary XGMI hight pstate trigger when mapping none-vram memory for
> peer device
>
> Change-Id: I1881deff3da19f1f4b58d5765db03a590092a5b2
> Signed-off-by: shaoyunl
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +
>
-
> *From:* amd-gfx on behalf of
> Kuehling, Felix
> *Sent:* March 25, 2019 6:28:32 PM
> *To:* Liu, Shaoyun; amd-gfx@lists.freedesktop.org
> *Subject:* Re: [PATCH] drm/amdgpu: XGMI pstate switch initial support
> I don't see any check for the memory type. As far as I can tell you'll
>
The series is Reviewed-by: Felix Kuehling
On 2019-03-25 8:22 a.m., Christian König wrote:
> Clean that up further and also fix another case where the BO
> wasn't kmapped for CPU based updates.
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31
On 2019-03-22 12:58 p.m., John Donnelly wrote:
> Hello ,
>
> I am investigating a issue reported by a test group concerning this driver.
> Their test loads and unloads every kernel module included in the 4.14.35
> kernel release . You don’t even need a AMD platform . It occurs on any Intel,
>
I don't see any check for the memory type. As far as I can tell you'll
power up XGMI even for system memory mappings. See inline.
On 2019-03-22 3:28 p.m., Liu, Shaoyun wrote:
> Driver vote low to high pstate switch whenever there is an outstanding
> XGMI mapping request. Driver vote high to low
On 2019-03-20 10:51 a.m., Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
>
> amdgpu_ttm_tt_get_user_pages() uses provided user
On 2019-03-25 7:38 a.m., Christian König wrote:
> Am 20.03.19 um 12:57 schrieb Kuehling, Felix:
>> As far as I can tell, the whole series is a small cleanup and big
>> refactor to enable CPU clearing of PTs without a lot of ugliness or code
>> duplication.
>
> It's a bi
As far as I can tell, the whole series is a small cleanup and big
refactor to enable CPU clearing of PTs without a lot of ugliness or code
duplication. It looks good to me. I haven't reviewed all the moved SDMA
update code to make sure it all works correctly, but at least the
prepare and
cause
> you don't have a lock protecting the hw update itself. E.g. while
> powering down you can add a mapping which needs to power it up again
> and so powering down and powering up race with each other.
That's a good point.
Regards,
Felix
>
> Regards,
> Christian.
>
&g
We discussed a few different approaches before settling on this one.
Maybe it needs some more background. XGMI links are quite power hungry.
Being able to power them down improves performance for power-limited
workloads that don't need XGMI. In machine learning, pretty much all
workloads are
Alex already applied an equivalent patch by Colin King (attached for
reference).
Regards,
Felix
On 3/18/2019 2:05 PM, Gustavo A. R. Silva wrote:
> Assign return value of function amdgpu_bo_sync_wait() to variable ret
> for its further check.
>
> Addresses-Coverity-ID: 1443914 ("Logically
> echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main'
>> | sudo tee /etc/apt/sources.list.d/rocm.list
>> sudo apt install rocm-opencl-dev
>>
>> Also exactly the same issue happens with this board:
>> https://www.gigabyte.com/Motherboard/GA-AB350-Gami
--
> *From:* amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org>> on behalf of Lauri
> Ehrenpreis mailto:lauri...@gmail.com>>
> *Sent:* Tuesday, March 12, 2019 5:31 PM
> *To:* Kuehling, Felix
> *Cc:* Tom St Denis; amd-gfx@lists.freedesktop.
The series is Reviewed-by: Felix Kuehling
On 2019-03-13 9:44 a.m., Christian König wrote:
> Now that we have re-reoute faults to the other IH
> ring we can enable retries again.
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 2 +-
>
This patch is Reviewed-by: Felix Kuehling
Regards,
Felix
On 3/12/2019 9:17 PM, Yang, Philip wrote:
> userptr may cross two VMAs if the forked child process (not call exec
> after fork) malloc buffer, then free it, and then malloc larger size
> buf, kerenl will create new VMA adjacent to old
a dedicate SDMA engine
> for PTE update including clear? ). But if we didn't use the same
> engine , it may explain why the test failed occasionally.
>
> Regards
>
> shaoyun.liu
>
>
>
> On 2019-03-12 5:20 p.m., Kuehling, Felix wrote:
>> When page table are upd
t;> Peculiar, I hit it immediately when I ran it . Can you try use
>>> --gtest_filter=KFDCWSRTest.BasicTest . That one hung every time for me.
>>>
>>> Kent
>>>
>>>> -Original Message-
>>>> From: Christian König
>>>>
When page table are updated by the CPU, synchronize with the
allocation and initialization of newly allocated page tables.
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +---
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git
ery time for me.
>>
>>Kent
>>
>>> -Original Message-
>>> From: Christian König
>>> Sent: Tuesday, March 12, 2019 11:09 AM
>>> To: Russell, Kent ; Koenig, Christian
>>> ; Kuehling, Felix ;
>>> amd-gfx@lists.freedesktop.org
>>>
;> From what I've been able to dig through, the VM Fault seems to
>>> occur right after a doorbell mmap, but that's as far as I got. I can
>>> try to revert it in today's merge and see how things go.
>>>
>>> Kent
>>>
>>>> -Origin
On 2019-03-06 9:42 p.m., Yang, Philip wrote:
> Userptr restore may have concurrent userptr invalidation after
> hmm_vma_fault adds the range to the hmm->ranges list, needs call
> hmm_vma_range_done to remove the range from hmm->ranges list first,
> then reschedule the restore worker. Otherwise
See one comment inline. There are still some potential problems that
you're not catching.
On 2019-03-06 9:42 p.m., Yang, Philip wrote:
> userptr may cross two VMAs if the forked child process (not call exec
> after fork) malloc buffer, then free it, and then malloc larger size
> buf, kerenl will
49 schrieb Russell, Kent:
>> From what I've been able to dig through, the VM Fault seems to occur
>> right after a doorbell mmap, but that's as far as I got. I can try to
>> revert it in today's merge and see how things go.
>>
>> Kent
>>
>>> -Origi
[adding the list back]
I'd suspect a problem related to memory clock. This is an APU where
system memory is shared with the CPU, so if the SMU changes memory
clocks that would affect CPU memory access performance. If the problem
only occurs when OpenCL is running, then the compute power
I think this would break Raven, which only has one SDMA engine.
Regards,
Felix
-Original Message-
From: amd-gfx On Behalf Of Christian
König
Sent: Tuesday, March 12, 2019 8:38 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 2/3] drm/amdgpu: free up the first paging queue
We
My concerns were related to eviction fence handing. It would manifest by
unnecessary eviction callbacks into KFD that aren't cause by real evictions. I
addressed that with a previous patch series that removed the need to remove
eviction fences and add them back around page table updates in
Hmm, that's a clever (and elegant) little data structure. The series is
Reviewed-by: Felix Kuehling
Regards,
Felix
On 3/7/2019 8:28 AM, Christian König wrote:
> Further testing showed that the idea with the chash doesn't work as expected.
> Especially we can't predict when we can remove the
Some comments inline ...
On 3/5/2019 1:09 PM, Yang, Philip wrote:
> userptr may cross two VMAs if the forked child process (not call exec
> after fork) malloc buffer, then free it, and then malloc larger size
> buf, kerenl will create new VMA adjacent to old VMA which was cloned
> from parent
Hmm, I'm not sure. This change probably fixes this issue, but there may
be other similar corner cases in other situations where the restore
worker fails and needs to retry. The better place to call untrack in
amdgpu_amdkfd_restore_userptr_worker would be at the very end. Anything
that's left
On 2019-03-05 6:20 a.m., Michel Dänzer wrote:
> From: Michel Dänzer
>
> The compiler pointed out that one if block unintentionally wasn't part
> of the loop:
>
> In file included from ./include/linux/kernfs.h:14,
> from ./include/linux/sysfs.h:16,
> from
One not so obvious change here: The fence on the page table after
clear_bo now waits for clearing both the page table and the shadow. That
may make clearing of page tables appear a bit slower. On the other hand,
if you're clearing a bunch of page tables at once, then difference will
be minimal
Since you're addressing two distinct bugs, please split this into two patches.
For the multiple VMAs, should we generalize that to handle any number of VMAs?
It's not a typical case, but you could easily construct situations with
mprotect where different parts of the same buffer have different
On 2/28/2019 9:56 AM, Christian König wrote:
> Am 28.02.19 um 16:32 schrieb Russell, Kent:
>> Add 3 files that return:
>> The total amount of VRAM and the current total used VRAM
>> The total amount of VRAM and the current total used visible VRAM
>> The total GTT size and the current total of used
On 2/25/2019 2:58 PM, Thomas Hellstrom wrote:
> On Mon, 2019-02-25 at 14:20 +, Koenig, Christian wrote:
>> Am 23.02.19 um 00:19 schrieb Kuehling, Felix:
>>> Don't account for them in other zones such as dma32. The kernel
>>> page
>>> allocator has its own he
Don't account for them in other zones such as dma32. The kernel page
allocator has its own heuristics to avoid exhausting special zones
for regular kernel allocations.
Signed-off-by: Felix Kuehling
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
drivers/gpu/drm/ttm/ttm_memory.c | 6
On 2019-02-22 8:45 a.m., Thomas Hellstrom wrote:
> On Fri, 2019-02-22 at 07:10 +, Koenig, Christian wrote:
>> Am 21.02.19 um 22:02 schrieb Thomas Hellstrom:
>>> Hi,
>>>
>>> On Thu, 2019-02-21 at 20:24 +, Kuehling, Felix wrote:
>>>>
On 2019-02-21 12:34 p.m., Thomas Hellstrom wrote:
> On Thu, 2019-02-21 at 16:57 +0000, Kuehling, Felix wrote:
>> On 2019-02-21 2:59 a.m., Koenig, Christian wrote:
>>> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on
>>>>> x86_64? Can we make
On 2019-02-21 12:48 p.m., Yang, Philip wrote:
> Only select HMM_MIRROR will get kernel config dependency warnings
> if CONFIG_HMM is missing in the config. Add depends on HMM will
> solve the issue.
>
> Add conditional compilation to fix compilation errors if HMM_MIRROR
> is not enabled as HMM
On 2019-02-21 2:59 a.m., Koenig, Christian wrote:
> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on
>>> x86_64? Can we make x86_64 more like HIGHMEM instead?
>>>
>>> Regards,
>>> Felix
>>>
>> IIRC with x86, the kernel zone is always smaller than any dma32 zone,
>> so we'd
On 2019-02-20 6:34 p.m., Jerome Glisse wrote:
> On Wed, Feb 20, 2019 at 10:39:49PM +0000, Kuehling, Felix wrote:
>> On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
>>> On Wed, Feb 20, 2019 at 07:18:17PM +0000, Kuehling, Felix wrote:
>>>> [+Jerome]
>>>>
>&
On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
> On Wed, Feb 20, 2019 at 07:18:17PM +0000, Kuehling, Felix wrote:
>> [+Jerome]
>>
>> Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
>> CPU page tables to device page tables.
>>
>> ARCH_H
On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote:
> On Tue, 2019-02-19 at 17:06 +0000, Kuehling, Felix wrote:
>> On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
>>> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
>>>> Am 18.02.19 um 10:47 schrieb Thomas He
[+Jerome]
Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
CPU page tables to device page tables.
ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative
for ARM support?
Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the
CPU
I guess we'll need something similar for KFD? I don't think we've ever
intentionally tested KFD with swiotlb. But I've seen some backtraces
with swiotlb in them before. I wonder how badly broken it is ...
Regards,
Felix
On 2019-02-20 8:46 a.m., Christian König wrote:
> Otherwise we can't be
I commented on patches 2 and 3 in separate emails. The rest of the
series is Reviewed-by: Felix Kuehling
On 2019-02-19 8:40 a.m., Christian König wrote:
> Clear the VM PDs/PTs only after initializing all the structures.
>
> Signed-off-by: Christian König
> ---
>
On 2019-02-19 8:40 a.m., Christian König wrote:
> Instead of providing it from outside figure out the ats status in the
> function itself from the data structures.
>
> Signed-off-by: Christian König
One suggestion inline. Other than that this patch is Reviewed-by: Felix
Kuehling
Regards,
Comments inline.
On 2019-02-19 8:40 a.m., Christian König wrote:
> This way we only deal with the real BO in here.
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 64 --
> 1 file changed, 39 insertions(+), 25 deletions(-)
>
> diff
On 2019-02-19 4:09 p.m., Liu, Shaoyun wrote:
> When DPM for the specific clock is difabled, driver should still print out
> current clock info for rocm-smi support on vega20
>
> Change-Id: I8669c77bf153caa2cd63a575802eb58747151239
> Signed-off-by: shaoyunl
> ---
>
301 - 400 of 545 matches
Mail list logo