On Mon, Mar 23, 2020 at 10:14:51PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe
>
> Delete several functions that are never called, fix some desync between
> comments and structure content, toss the now out of date top of file
> header, and move one function only used by hmm.c into hmm.c
From: Jason Gunthorpe
This is v2 of the first simple series with a few additional patches of little
adjustments.
This needs an additional patch to the hmm tester:
diff --git a/tools/testing/selftests/vm/hmm-tests.c
b/tools/testing/selftests/vm/hmm-tests.c
index 033a12c7ab5b6d..da15471a2bbf9a 1
From: Jason Gunthorpe
Using two bools instead of flags return is not necessary and leads to
bugs. Returning a value is easier for the compiler to check and easier to
pass around the code flow.
Convert the two bools into flags and push the change to all callers.
Signed-off-by: Jason Gunthorpe
-
From: Jason Gunthorpe
swp_offset() should not be called directly, the wrappers are supposed to
abstract away the encoding of the device_private specific information in
the swap entry.
Reviewed-by: Ralph Campbell
Signed-off-by: Jason Gunthorpe
---
mm/hmm.c | 2 +-
1 file changed, 1 insertion(+
On Mon, Mar 23, 2020 at 10:14:50PM -0300, Jason Gunthorpe wrote:
> +enum {
> + HMM_NEED_FAULT = 1 << 0,
> + HMM_NEED_WRITE_FAULT = HMM_NEED_FAULT | (1 << 1),
> + HMM_NEED_ALL_BITS = HMM_NEED_FAULT | HMM_NEED_WRITE_FAULT,
I have to say I find the compound version of HMM_NEED_WRITE_FAULT
From: Jason Gunthorpe
This code can be compiled when CONFIG_TRANSPARENT_HUGEPAGE is off, so
remove the ifdef.
The function is only ever called under
if (pmd_devmap(pmd) || pmd_trans_huge(pmd))
Which is statically false if !CONFIG_TRANSPARENT_HUGEPAGE, so the compiler
reliably eliminates all
From: Jason Gunthorpe
The checking boils down to some racy check if the pagemap is still
available or not. Instead of checking this, rely entirely on the
notifiers, if a pagemap is destroyed then all pages that belong to it must
be removed from the tables and the notifiers triggered.
Reviewed-by
From: Jason Gunthorpe
Delete several functions that are never called, fix some desync between
comments and structure content, toss the now out of date top of file
header, and move one function only used by hmm.c into hmm.c
Signed-off-by: Jason Gunthorpe
---
include/linux/hmm.h | 104 +-
On Mon, Mar 23, 2020 at 10:14:53PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe
>
> This code can be compiled when CONFIG_TRANSPARENT_HUGEPAGE is off, so
> remove the ifdef.
>
> The function is only ever called under
>
>if (pmd_devmap(pmd) || pmd_trans_huge(pmd))
>
> Which is stat
From: Jason Gunthorpe
The pagewalker does not call most ops with NULL vma, those are all routed
to pte_hole instead.
Thus hmm_vma_fault() is only called with a NULL vma from
hmm_vma_walk_hole(), so hoist the check to there.
Now it is clear that snapshotting with no vma is a HMM_PFN_ERROR as
wit
On Mon, Mar 23, 2020 at 10:14:55PM -0300, Jason Gunthorpe wrote:
> if (pte_none(pte)) {
> required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0);
> if (required_fault)
> goto fault;
> + *pfn = range->values[HMM_PFN_NONE];
>
From: Jason Gunthorpe
Now that flags are handled on a fine-grained per-page basis this global
flag is redundant and has a confusing overlap with the pfn_flags_mask and
default_flags.
Normalize the HMM_FAULT_SNAPSHOT behavior into one place. Callers needing
the SNAPSHOT behavior should set a pfn_
From: Jason Gunthorpe
In hmm_vma_handle_pte() and hmm_vma_walk_hugetlb_entry() if fault happens
then -EBUSY will be returned and the pfns input flags will have been
destroyed.
For hmm_vma_handle_pte() set HMM_PFN_NONE only on the success returns that
don't otherwise store to pfns.
For hmm_vma_w
On Mon, Mar 23, 2020 at 10:14:54PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe
>
> swp_offset() should not be called directly, the wrappers are supposed to
> abstract away the encoding of the device_private specific information in
> the swap entry.
>
> Reviewed-by: Ralph Campbell
> Si
>
> +/*
> + * If the valid flag is masked off, and default_flags doesn't set valid, then
> + * hmm_pte_need_fault() always returns 0.
> + */
> +static bool hmm_can_fault(struct hmm_range *range)
> +{
> + return ((range->flags[HMM_PFN_VALID] & range->pfn_flags_mask) |
> + range->de
From: Jason Gunthorpe
Most places that return an error code, like -EFAULT, do not set
HMM_PFN_ERROR, only two places do this.
Resolve this inconsistency by never setting the pfns on an error
exit. This doesn't seem like a worthwhile thing to do anyhow.
If for some reason it becomes important, i
On Mon, Mar 23, 2020 at 10:14:56PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe
>
> Most places that return an error code, like -EFAULT, do not set
> HMM_PFN_ERROR, only two places do this.
>
> Resolve this inconsistency by never setting the pfns on an error
> exit. This doesn't seem li
On Mon, Mar 23, 2020 at 10:14:57PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe
>
> The pagewalker does not call most ops with NULL vma, those are all routed
> to pte_hole instead.
Does ->pte_hole
>
> Thus hmm_vma_fault() is only called with a NULL vma from
> hmm_vma_walk_hole(), so
Hi Sam
Am 13.03.20 um 21:17 schrieb Sam Ravnborg:
> Thomas Zimmermann had made a nice patch-set that introduced
> drm_simple_encoder_init() which is already present in drm-misc-next.
>
> While looking at this it was suddenly obvious to me that
> this was functionalty that really should be include
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index f0128f7..0a95b13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/driver
we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDE
those two headers are not needed for ip discovery
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 27d8ae1..37e1fcf 1
1) SRIOV guest KMD doesn't care training buffer
2) if we resered training buffer that will overlap with IP discovery
reservation because training buffer is at vram_size - 0x8000 and
IP discovery is at ()vram_size - 0x1 => vram_size -1)
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/a
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_en
Reporting the fw_version just returns 0, the actual version is kept as
ta_*_ucode_version. This is the same as the feature reported in
the amdgpu_firmware_info debugfs file.
Signed-off-by: Kent Russell
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 ++--
1 file changed, 2 insertions(+), 2 del
Ensure that when we memcpy, we don't end up copying more data than
the struct supports. For now, this is 16 characters for product number
and serial number, and 32 chars for product name
Signed-off-by: Kent Russell
---
.../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 +++
1 fil
[AMD Official Use Only - Internal Distribution Only]
Does this issue occur when gpu recovery?
I just check the code, fence timedout will free job and put its fence. but gpu
recovery might resubmit job.
Correct me if I am wrong.
From: amd-gfx on behalf of Andrey
This is only for the guilty job which was removed from the
ring_mirror_list due to completion and hence will not be resubmitted by
recovery and will not be freed by the usual flow in
drm_sched_get_cleanup_job (see drm_sched_stop)
Andrey
On 3/24/20 10:45 AM, Pan, Xinhui wrote:
[AMD Official
On Tue, Mar 24, 2020 at 7:42 AM Kent Russell wrote:
>
> Reporting the fw_version just returns 0, the actual version is kept as
> ta_*_ucode_version. This is the same as the feature reported in
> the amdgpu_firmware_info debugfs file.
>
> Signed-off-by: Kent Russell
Reviewed-by: Alex Deucher
>
On Tue, Mar 24, 2020 at 7:49 AM Kent Russell wrote:
>
> Ensure that when we memcpy, we don't end up copying more data than
> the struct supports. For now, this is 16 characters for product number
> and serial number, and 32 chars for product name
>
> Signed-off-by: Kent Russell
Reviewed-by: Alex
On Tue, Mar 24, 2020 at 6:59 AM Monk Liu wrote:
>
> we need to move virt detection much earlier because:
> 1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
> be at DE5 (dw) mmio offset from vega10, this way there is no
> need to implement detect_hw_virt() routine in each nbio/chip f
On Tue, Mar 24, 2020 at 08:37:46AM +0100, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 10:14:55PM -0300, Jason Gunthorpe wrote:
> > if (pte_none(pte)) {
> > required_fault = hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0);
> > if (required_fault)
> >
Sorry for the messed-up link. This is the link (rocm-smi-lib) which
makes use of the interface
https://github.com/RadeonOpenCompute/rocm_smi_lib
On 2020-03-23 2:19 p.m., Amber Lin wrote:
Somehow my reply didn't seem to reach the mailing list...
Hi Alex,
https://nam11.safelinks.protection.outl
Hi,
I think this series is a good clean-up.
Could you take a look at this please?
On Fri, Feb 14, 2020 at 12:40 AM Masahiro Yamada wrote:
>
> A header include path without $(srctree)/ is suspicious because it does
> not work with O= builds.
>
> You can build drivers/gpu/drm/radeon/ without th
There is a misunderstanding here:
Did you find out why the zero refcount on the finished fence happens
before the fence was signaled ?
The refcount on the finished fence doesn't become zero before it is
signaled, it becomes zero while it is signaled.
CPU 1 calls dma_fence_signal(fence) with
Am 24.03.20 um 12:40 schrieb Nirmoy Das:
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct a
On Tue, Mar 17, 2020 at 12:03:20PM -0400, Kenny Ho wrote:
> What's your thoughts on this latest series?
My overall impression is that the feedbacks aren't being incorporated throughly
/ sufficiently.
Thanks.
--
tejun
___
amd-gfx mailing list
amd-gfx@l
Hi Tejun,
Can you elaborate more on what are the missing pieces?
Regards,
Kenny
On Tue, Mar 24, 2020 at 2:46 PM Tejun Heo wrote:
>
> On Tue, Mar 17, 2020 at 12:03:20PM -0400, Kenny Ho wrote:
> > What's your thoughts on this latest series?
>
> My overall impression is that the feedbacks aren't b
On Tue, Mar 24, 2020 at 08:27:12AM +0100, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 10:14:50PM -0300, Jason Gunthorpe wrote:
> > +enum {
> > + HMM_NEED_FAULT = 1 << 0,
> > + HMM_NEED_WRITE_FAULT = HMM_NEED_FAULT | (1 << 1),
> > + HMM_NEED_ALL_BITS = HMM_NEED_FAULT | HMM_NEED_WRITE_FA
Hi guys,
recently I've been tracing some IRQ latencies in a system and the
display handling in amdgpu doesn't really look that good. To be honest
it also doesn't look too bad, but I still want to share my findings
here. The trace below is from a single vblank IRQ with a pageflip.
The most interes
On Tue, Mar 24, 2020 at 08:33:39AM +0100, Christoph Hellwig wrote:
> >
> > +/*
> > + * If the valid flag is masked off, and default_flags doesn't set valid,
> > then
> > + * hmm_pte_need_fault() always returns 0.
> > + */
> > +static bool hmm_can_fault(struct hmm_range *range)
> > +{
> > + ret
On Tue, Mar 24, 2020 at 12:48 PM Masahiro Yamada wrote:
>
> Hi,
>
> I think this series is a good clean-up.
>
> Could you take a look at this please?
Can you resend? I don't seem to have gotten it. Must have ended up
getting flagged a spam or something.
Alex
>
>
>
> On Fri, Feb 14, 2020 at 12
To make sure the CAP feature is supported by the SOS, add SOS FW version
checking before loading the CAP FW.
Change-Id: I7aa1c09f9c117f67ede0db6cd5911d56c8568495
Signed-off-by: Zhigang Luo
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/driver
I see now. In that case the change seems good to me.
Andrey
From: Koenig, Christian
Sent: 24 March 2020 13:58
To: Grodzovsky, Andrey ; Pan, Xinhui
; Tao, Yintian ; Deucher, Alexander
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: hold the r
[AMD Official Use Only - Internal Distribution Only]
Reviewed by : Shaoyun.liu
-Original Message-
From: amd-gfx On Behalf Of Zhigang Luo
Sent: Tuesday, March 24, 2020 3:48 PM
To: amd-gfx@lists.freedesktop.org
Cc: Luo, Zhigang
Subject: [PATCH] drm/amdgpu: add SOS FW version checking fo
Support added into IH to enable ring1 and ring2 for navi10_ih.
Signed-off-by: Alex Sierra
---
drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 205 +++--
1 file changed, 189 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
b/drivers/gpu/drm/amd/amd
call psp to program ih cntl in SR-IOV if supported on Navi and Arcturus.
Signed-off-by: Alex Sierra
---
drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 90 +++---
1 file changed, 80 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
b/drivers/gpu/dr
[Why]
Due Page faults can easily overwhelm the interrupt handler.
So to make sure that we never lose valuable interrupts on the primary ring
we re-route page faults to IH ring 1.
It also facilitates the recovery page process, since it's already
running from a process context.
This is valid for Arct
[Why]
Vega20 and Arcturus asics use oss 5.0 version.
[How]
Replace ih ip block by navi10 for vega20 and arcturus.
Signed-off-by: Alex Sierra
---
drivers/gpu/drm/amd/amdgpu/soc15.c | 11 ++-
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
On Wed, Mar 25, 2020 at 4:42 AM Alex Deucher wrote:
>
> On Tue, Mar 24, 2020 at 12:48 PM Masahiro Yamada wrote:
> >
> > Hi,
> >
> > I think this series is a good clean-up.
> >
> > Could you take a look at this please?
>
> Can you resend? I don't seem to have gotten it. Must have ended up
> gett
[AMD Official Use Only - Internal Distribution Only]
Series Reviewed-by: Emily Deng
>-Original Message-
>From: amd-gfx On Behalf Of Monk Liu
>Sent: Tuesday, March 24, 2020 6:59 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk
>Subject: [PATCH 4/4] drm/amdgpu: cleanup all virtualiza
1) modify xgpu_nv_send_access_requests to support
new idh request
2) introduce new function: req_gpu_init_data() which
is used to notify host to prepare vbios/ip-discovery/pfvf exchange
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 +
drivers/gpu/drm/amd/amdg
if host support new handshake we only need to enter
fullaccess_mode in ip_init() part, otherwise we need
to do it before reading vbios (becuase host prepares vbios
for VF only after received REQ_GPU_INIT event under
legacy handshake)
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_
what:
with the new "req_init_data" handshake we need to use mailbox
before do IP discovery, so in mxgpu_nv.c file the original
SOC15_REG method won'twork because that depends on IP discovery
complete first.
how:
so the solution is to always use static MMIO offset for NV+ mailbox
registers.
HW team
by this new handshake host side can prepare vbios/ip-discovery
and pf&vf exchange data upon recieving this request without
stopping world switch.
this way the world switch is less impacted by VF's exclusive mode
request
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19
1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h
2) the IDH_EVENT_MAX is not used and not aligned with host side
so drop it
3) the IDH_TEXT_MESSAG was provided in host but not defined in guest
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 3 ++-
dr
what:
1)move timtout setting before ip_early_init to reduce exclusive mode
cost for SRIOV
2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init()
it is a prepare for the later upcoming patches.
why:
in later upcoming patches we would use a new mailbox event --
"req_gpu_init_data"
new idh_request and ihd_event to prepare for the
new handshake protocol implementation later
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h
inde
Hit panic during GPU recovery test. drm_sched_entity_select_rq might
set NULL to rq. So add a check like drm_sched_job_init does.
Cc: Christian König
Cc: Alex Deucher
Cc: Felix Kuehling
Signed-off-by: xinhui pan
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 ++
1 file changed, 2 inserti
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to clear xgmi ras error counters inbetween ras error query
0001-drm-amdgpu-added-xgmi-ras-error-reset-sequence.patch
Description: 0001-drm-amdgpu-added-xgmi-ras-error-reset-sequence.patch
__
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to disable ras debugfs features during the entire GPU reset
cycle
0001-drm-amdgpu-disable-ras-query-during-gpu-reset.patch
Description: 0001-drm-amdgpu-disable-ras-query-during-gpu-reset.patch
[AMD Official Use Only - Internal Distribution Only]
+ case CHIP_VEGA20:
+ default:
I'd suggest do nothing for default case. Other than that the patch is
Reviewed-by: Hawking Zhang
Regards,
Hawking
From: Clements, John
Sent: Wednesday, March 25, 2020 14:50
To: amd-gfx@
62 matches
Mail list logo