Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Koenig, Christian
Am 21.02.19 um 07:47 schrieb Thomas Hellstrom:
> On Wed, 2019-02-20 at 19:23 +, Kuehling, Felix wrote:
>> On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote:
>>> On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:
 On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
>> Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
>>> On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
 Another good question is also why the heck the acc_size
 counts
 towards
 the DMA32 zone?
>>> DMA32 TTM pages are accounted in the DMA32 zone. Other
>>> pages
>>> are
>>> not.
>> Yeah, I'm perfectly aware of this. But this is for the
>> accounting
>> size!
>>
>> We have an accounting for the stuff needed additional to the
>> pages
>> backing the BO (e.g. the page and DMA addr array).
>>
>> And from the bug description it sounds like we use the DMA32
>> zone
>> for
>> this accounting which of course is completely nonsense.
> It's actually accounted in all available zones, since it would
> be
> pretty hard to determine exactly where that memory should be
> accounted.
> In particular if it's vmalloced. It might be DMA32, it might
> not.
> Given
> the objective of stopping malicious user-space from exhausting
> the
> DMA32 zone it was, at the time the code was written, a
> reasonable
> approximation. With ever increasing memory sizes, there might
> be
> better
> solutions?
 As far as I can see, in TTM, ttm_mem_global_alloc is only used
 for
 the
 acc_size in ttm_bo_init_reserved. Other than that vmwgfx also
 seems
 to
 use it to account for a few things that are allocated with
 kmalloc.

 So would a better solution be to change ttm_mem_global_alloc to
 use
 only
 the kernel zone?

>>> IMO we need to determine what functionality to keep and then the
>>> best
>>> solution. The current code does its job, but is obviously too
>>> restrictive. Both of the solutions you suggest open up for
>>> potential
>>> DOS attacks (DMA32 and kernel zones are not mutually exclusive.
>>> They
>>> overlap).
>> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on
>> x86_64? Can we make x86_64 more like HIGHMEM instead?
>>
>> Regards,
>> Felix
>>
> IIRC with x86, the kernel zone is always smaller than any dma32 zone,
> so we'd always exhaust the kernel zone before dma32 anyway.
>
> Not sure why we have dma32 on x86 without highmem, though. sounds
> superflous but harmless.

Well DMA32 denotes memory which is accessible by devices who can only do 
32bit addressing. And IIRC we can actually do DMA32 to highmem since 
something like 2.4.*.

Because of this it is actually irrelevant if you have highmem or not, 
what matters for DMA32 is if you have an IOMMU or not.

So even on x86_64 you actually do need the DMA32 zone if you don't have 
an IOMMU which remaps all memory for devices which can't directly 
address it.

Regards,
Christian.

>
> /Thomas
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Thomas Hellstrom
On Wed, 2019-02-20 at 19:23 +, Kuehling, Felix wrote:
> On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote:
> > On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:
> > > On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
> > > > On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
> > > > > Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
> > > > > > On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
> > > > > > > Another good question is also why the heck the acc_size
> > > > > > > counts
> > > > > > > towards
> > > > > > > the DMA32 zone?
> > > > > > DMA32 TTM pages are accounted in the DMA32 zone. Other
> > > > > > pages
> > > > > > are
> > > > > > not.
> > > > > Yeah, I'm perfectly aware of this. But this is for the
> > > > > accounting
> > > > > size!
> > > > > 
> > > > > We have an accounting for the stuff needed additional to the
> > > > > pages
> > > > > backing the BO (e.g. the page and DMA addr array).
> > > > > 
> > > > > And from the bug description it sounds like we use the DMA32
> > > > > zone
> > > > > for
> > > > > this accounting which of course is completely nonsense.
> > > > It's actually accounted in all available zones, since it would
> > > > be
> > > > pretty hard to determine exactly where that memory should be
> > > > accounted.
> > > > In particular if it's vmalloced. It might be DMA32, it might
> > > > not.
> > > > Given
> > > > the objective of stopping malicious user-space from exhausting
> > > > the
> > > > DMA32 zone it was, at the time the code was written, a
> > > > reasonable
> > > > approximation. With ever increasing memory sizes, there might
> > > > be
> > > > better
> > > > solutions?
> > > As far as I can see, in TTM, ttm_mem_global_alloc is only used
> > > for
> > > the
> > > acc_size in ttm_bo_init_reserved. Other than that vmwgfx also
> > > seems
> > > to
> > > use it to account for a few things that are allocated with
> > > kmalloc.
> > > 
> > > So would a better solution be to change ttm_mem_global_alloc to
> > > use
> > > only
> > > the kernel zone?
> > > 
> > IMO we need to determine what functionality to keep and then the
> > best
> > solution. The current code does its job, but is obviously too
> > restrictive. Both of the solutions you suggest open up for
> > potential
> > DOS attacks (DMA32 and kernel zones are not mutually exclusive.
> > They
> > overlap).
> On x86 with HIGHMEM there is no dma32 zone. Why do we need one on 
> x86_64? Can we make x86_64 more like HIGHMEM instead?
> 
> Regards,
>Felix
> 

IIRC with x86, the kernel zone is always smaller than any dma32 zone,
so we'd always exhaust the kernel zone before dma32 anyway.

Not sure why we have dma32 on x86 without highmem, though. sounds
superflous but harmless.

/Thomas



> 
> > 
> > /Thomas
> > 
> > 
> > 
> > 
> > > Regards,
> > > Felix
> > > 
> > > 
> > > > /Thomas
> > > > 
> > > > > Christian.
> > > > > 
> > > > > > For small persistent allocations using
> > > > > > ttm_mem_global_alloc(),
> > > > > > they
> > > > > > are
> > > > > > accounted also in the DMA32 zone, which may cause over-
> > > > > > accounting
> > > > > > of
> > > > > > that zone, but that's pretty unlikely to be a big problem..
> > > > > > 
> > > > > > /Thomas
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > In other words why should the internal bookkeeping pages
> > > > > > > be
> > > > > > > allocated
> > > > > > > in
> > > > > > > the DMA32 zone?
> > > > > > > 
> > > > > > > That doesn't sounds valid to me in any way,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> > > > > > > > Hmm,
> > > > > > > > 
> > > > > > > > This zone was intended to stop TTM page allocations
> > > > > > > > from
> > > > > > > > exhausting
> > > > > > > > the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
> > > > > > > > default,
> > > > > > > > which
> > > > > > > > means if we drop this check, other devices may stop
> > > > > > > > functioning
> > > > > > > > unexpectedly?
> > > > > > > > 
> > > > > > > > However, in the end I'd expect the kernel page
> > > > > > > > allocation
> > > > > > > > system
> > > > > > > > to
> > > > > > > > make sure there are some pages left in the DMA32 zone,
> > > > > > > > otherwise
> > > > > > > > random non-IO page allocations would also potentially
> > > > > > > > exhaust
> > > > > > > > the
> > > > > > > > DMA32 zone without anybody caring, which means removing
> > > > > > > > this
> > > > > > > > zone
> > > > > > > > wouldn't be any worse than whatever other subsystems
> > > > > > > > may be
> > > > > > > > doing
> > > > > > > > already...
> > > > > > > > 
> > > > > > > > /Thomas
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 2/16/19 12:02 AM, Kuehling, Felix wrote:
> > > > > > > > > This is an RFC. I'm not sure this is the right
> > > > > > > > > solution,
> > > > > > > > > but
> > > > > > > > > it
> > > > > > > > > highlights the problem I'm trying to solve.
> > > > > > > > > 
> > > > > 

Re: s2idle not working

2019-02-20 Thread Alex Deucher
On Wed, Feb 20, 2019 at 11:56 PM shahul hameed
 wrote:
>
> Hi All,
>
> I did below exoeriments:
>
> -> S2idle is working in software rendering mode(disable AMDGPU driver).
>
> -> For experiment purpose i commented suspend/resume calls of amdgpu driver, 
> in this case S2idle worked.
>
> With above experiments it is clear that my platform supports s2idel.
> Some issue in AMDGPU suspend/resume sequence for s2idle.
> So can you please help me.

I'd suggest filing a bug (https://bugs.freedesktop.org) and attaching
your full dmesg output.  I'm not too familiar with s2idle and what
pmops paths it uses.  I suspect what is happening is that the power is
not getting fully cut the the device when s2idle is invoked, and it's
left in a bad state on resume.  Maybe something like the attached
patch might help.  That said, if the platform is not properly cutting
power to the device at s2idle time, you probably aren't saving any
power anyway.

Alex

>
> Thanks & Regards,
> Sk shahul.
>
> On Thu, Feb 21, 2019 at 3:32 AM Alex Deucher  wrote:
>>
>> On Wed, Feb 20, 2019 at 4:59 PM shahul hameed
>>  wrote:
>> >
>> > Hi
>> >
>> > I am facing below issue. Can any one please help me to resolve it.
>>
>> As I mentioned previously, please make sure the platform supports
>> s2idle and that it is enabled in the sbios.  Does s2idle work on other
>> devices and the platform itself?
>>
>> Alex
>>
>> >
>> > Regards,
>> > Sk shahul
>> >
>> > -- Forwarded message -
>> > From: shahul hameed 
>> > Date: Wed, Feb 20, 2019, 3:13 PM
>> > Subject: s2idle not working
>> > To: 
>> >
>> >
>> > Hi Shirish
>> >
>> > I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card 
>> > is gfx9.
>> > Porting is done successfully. Now i am working on suspend/resume.
>> > Suspend/resume is working fine in deep mode.
>> > But suspend/resume is not working in s2idle (Suspend to idle) mode.
>> >
>> > during resume i found error in gpu driver.
>> >
>> > [  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed 
>> > (scratch(0xC040)=0xCAFEDEAD)
>> > [  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* 
>> > resume of IP block  failed -22
>> > [  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR* 
>> > amdgpu_device_ip_resume failed (-22).
>> > [  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
>> > [  105.862345] PM: Device :03:00.0 failed to resume async: error -22
>> >
>> > same issue is reproduced with ubuntu16.04.
>> >
>> > Can you please help me to resolve this issue.
>> > Thanks in Advance,
>> > Regards,
>> > Sk shahul.
>> > ___
>> > amd-gfx mailing list
>> > amd-gfx@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
From 8a506e026c0913b1b0efab89cb03bf81ad1b3a74 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Thu, 21 Feb 2019 01:24:15 -0500
Subject: [PATCH] drm/amdgpu: possibly reset device on resume

If the device was left in a bad state on resume form suspend
attempt to reset the GPU.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fcab1fe9bb68..6ab2fb3c0c51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2896,6 +2896,12 @@ int amdgpu_device_resume(struct drm_device *dev, bool resume, bool fbcon)
 			return r;
 	}
 
+	if (!amdgpu_sriov_vf(adev) && amdgpu_asic_need_reset_on_init(adev)) {
+		r = amdgpu_asic_reset(adev);
+		if (r)
+			return r;
+	}
+
 	/* post card */
 	if (amdgpu_device_need_post(adev)) {
 		r = amdgpu_atom_asic_init(adev->mode_info.atom_context);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: s2idle not working

2019-02-20 Thread shahul hameed
Hi All,
Below error also got

amdgpu :04:00.0: couldn't schedule ib on ring 
[  148.782052] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
(-22)
[  148.782054] amdgpu :04:00.0: couldn't schedule ib on ring 
[  148.782107] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
(-22)
[  148.782120] amdgpu :04:00.0: couldn't schedule ib on ring 
[  148.782173] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
(-22)
[  148.782175] amdgpu :04:00.0: couldn't schedule ib on ring 
[  148.782228] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
(-22)

Regards,
Skshahul.

On Thu, Feb 21, 2019 at 10:26 AM shahul hameed 
wrote:

> Hi All,
>
> I did below exoeriments:
>
> -> S2idle is working in software rendering mode(disable AMDGPU driver).
>
> -> For experiment purpose i commented suspend/resume calls of amdgpu
> driver, in this case S2idle worked.
> With above experiments it is clear that my platform supports s2idel.
> Some issue in AMDGPU suspend/resume sequence for s2idle.
> So can you please help me.
>
> Thanks & Regards,
> Sk shahul.
>
> On Thu, Feb 21, 2019 at 3:32 AM Alex Deucher 
> wrote:
>
>> On Wed, Feb 20, 2019 at 4:59 PM shahul hameed
>>  wrote:
>> >
>> > Hi
>> >
>> > I am facing below issue. Can any one please help me to resolve it.
>>
>> As I mentioned previously, please make sure the platform supports
>> s2idle and that it is enabled in the sbios.  Does s2idle work on other
>> devices and the platform itself?
>>
>> Alex
>>
>> >
>> > Regards,
>> > Sk shahul
>> >
>> > -- Forwarded message -
>> > From: shahul hameed 
>> > Date: Wed, Feb 20, 2019, 3:13 PM
>> > Subject: s2idle not working
>> > To: 
>> >
>> >
>> > Hi Shirish
>> >
>> > I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic
>> card is gfx9.
>> > Porting is done successfully. Now i am working on suspend/resume.
>> > Suspend/resume is working fine in deep mode.
>> > But suspend/resume is not working in s2idle (Suspend to idle) mode.
>> >
>> > during resume i found error in gpu driver.
>> >
>> > [  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable
>> failed (scratch(0xC040)=0xCAFEDEAD)
>> > [  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
>> resume of IP block  failed -22
>> > [  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR*
>> amdgpu_device_ip_resume failed (-22).
>> > [  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
>> > [  105.862345] PM: Device :03:00.0 failed to resume async: error -22
>> >
>> > same issue is reproduced with ubuntu16.04.
>> >
>> > Can you please help me to resolve this issue.
>> > Thanks in Advance,
>> > Regards,
>> > Sk shahul.
>> > ___
>> > amd-gfx mailing list
>> > amd-gfx@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: s2idle not working

2019-02-20 Thread shahul hameed
Hi All,

I did below exoeriments:

-> S2idle is working in software rendering mode(disable AMDGPU driver).

-> For experiment purpose i commented suspend/resume calls of amdgpu
driver, in this case S2idle worked.
With above experiments it is clear that my platform supports s2idel.
Some issue in AMDGPU suspend/resume sequence for s2idle.
So can you please help me.

Thanks & Regards,
Sk shahul.

On Thu, Feb 21, 2019 at 3:32 AM Alex Deucher  wrote:

> On Wed, Feb 20, 2019 at 4:59 PM shahul hameed
>  wrote:
> >
> > Hi
> >
> > I am facing below issue. Can any one please help me to resolve it.
>
> As I mentioned previously, please make sure the platform supports
> s2idle and that it is enabled in the sbios.  Does s2idle work on other
> devices and the platform itself?
>
> Alex
>
> >
> > Regards,
> > Sk shahul
> >
> > -- Forwarded message -
> > From: shahul hameed 
> > Date: Wed, Feb 20, 2019, 3:13 PM
> > Subject: s2idle not working
> > To: 
> >
> >
> > Hi Shirish
> >
> > I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card
> is gfx9.
> > Porting is done successfully. Now i am working on suspend/resume.
> > Suspend/resume is working fine in deep mode.
> > But suspend/resume is not working in s2idle (Suspend to idle) mode.
> >
> > during resume i found error in gpu driver.
> >
> > [  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed
> (scratch(0xC040)=0xCAFEDEAD)
> > [  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
> resume of IP block  failed -22
> > [  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR*
> amdgpu_device_ip_resume failed (-22).
> > [  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
> > [  105.862345] PM: Device :03:00.0 failed to resume async: error -22
> >
> > same issue is reproduced with ubuntu16.04.
> >
> > Can you please help me to resolve this issue.
> > Thanks in Advance,
> > Regards,
> > Sk shahul.
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable bulk moves for now

2019-02-20 Thread Sasha Levin

On Wed, Feb 20, 2019 at 03:16:06PM +0100, Christian König wrote:

The changes to fix those are two invasive for backporting.

Just disable the feature in 4.20 and 5.0.

Signed-off-by: Christian König 
Cc: [4.20+]


For the sake of having it in the log, could you point to the upstream
commit(s) that fix it?

--
Thanks,
Sasha
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/powerplay: add missing breaks in polaris10_smumgr

2019-02-20 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: 2019年2月21日 11:01
> To: amd-gfx list 
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu/powerplay: add missing breaks in
> polaris10_smumgr
> 
> Ping?
> 
> Alex
> 
> On Mon, Feb 18, 2019 at 5:35 PM Alex Deucher 
> wrote:
> >
> > This was noticed by Gustavo and his -Wimplicit-fallthrough patches.
> > However, in this case, I believe we should have breaks rather than
> > falling though, that said, in practice we should never fall through in
> > the first place so there should be no change in behavior.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> > b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> > index 52abca065764..2d4cfe14f72e 100644
> > --- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> > +++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> > @@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t
> type, uint32_t member)
> > case DRAM_LOG_BUFF_SIZE:
> > return offsetof(SMU74_SoftRegisters,
> DRAM_LOG_BUFF_SIZE);
> > }
> > +   break;
> > case SMU_Discrete_DpmTable:
> > switch (member) {
> > case UvdBootLevel:
> > @@ -2339,6 +2340,7 @@ static uint32_t polaris10_get_offsetof(uint32_t
> type, uint32_t member)
> > case LowSclkInterruptThreshold:
> > return offsetof(SMU74_Discrete_DpmTable,
> LowSclkInterruptThreshold);
> > }
> > +   break;
> > }
> > pr_warn("can't get the offset of type %x member %x\n", type,
> member);
> > return 0;
> > --
> > 2.20.1
> >
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/powerplay: add missing breaks in polaris10_smumgr

2019-02-20 Thread Alex Deucher
Ping?

Alex

On Mon, Feb 18, 2019 at 5:35 PM Alex Deucher  wrote:
>
> This was noticed by Gustavo and his -Wimplicit-fallthrough
> patches.  However, in this case, I believe we should have breaks
> rather than falling though, that said, in practice we should
> never fall through in the first place so there should be no
> change in behavior.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> index 52abca065764..2d4cfe14f72e 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> @@ -2330,6 +2330,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
> uint32_t member)
> case DRAM_LOG_BUFF_SIZE:
> return offsetof(SMU74_SoftRegisters, 
> DRAM_LOG_BUFF_SIZE);
> }
> +   break;
> case SMU_Discrete_DpmTable:
> switch (member) {
> case UvdBootLevel:
> @@ -2339,6 +2340,7 @@ static uint32_t polaris10_get_offsetof(uint32_t type, 
> uint32_t member)
> case LowSclkInterruptThreshold:
> return offsetof(SMU74_Discrete_DpmTable, 
> LowSclkInterruptThreshold);
> }
> +   break;
> }
> pr_warn("can't get the offset of type %x member %x\n", type, member);
> return 0;
> --
> 2.20.1
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Jerome Glisse
On Thu, Feb 21, 2019 at 12:17:33AM +, Kuehling, Felix wrote:
> On 2019-02-20 6:34 p.m., Jerome Glisse wrote:
> > On Wed, Feb 20, 2019 at 10:39:49PM +, Kuehling, Felix wrote:
> >> On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
> >>> On Wed, Feb 20, 2019 at 07:18:17PM +, Kuehling, Felix wrote:
>  [+Jerome]
> 
>  Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
>  CPU page tables to device page tables.
> 
>  ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative
>  for ARM support?
> 
>  Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the
>  CPU architecture rather than any driver. Jerome, do you have any advice?
> >>> This patch is wrong you need to depend on ARCH_HAS_HMM and
> >> Who selects ARCH_HAS_HMM? Currently I don't see this selected anywhere.
> >> So any config option that depends on it will be invisible in menuconfig.
> >> Do we need ARCH_HAS_HMM somewhere in the arch/x86/Kconfig and
> >> arch/powerpc/Kconfig?
> >>
> >> Also, ARCH_HAS_HMM does not currently support ARM. Does that mean we
> >> can't have ARM support in AMDGPU if we start using HMM?
> > ARCH_HAS_HMM is defined by architecture that support HMM. So par x86
> > and PPC. It should not be hard to add it to ARM (i can not remember if
> > ARM has DAX yet or not, if ARM does not have DAX then you need to add
> > that first).
> 
> Not having ARM support is a bummer. I just enabled KFD on ARM a few 
> weeks ago. Now depending on HMM makes KFD unusable on ARM. [+Mark FYI] I 
> hope this is only a temporary setback.

It should not be hard to add in fact all it might need is a Kconfig
patch. I have no easy access to ARM with PCIE so i have not tackle
this yet.

> 
> 
> >> Finally, ARCH_HAS_HMM has a bunch of dependencies. If they are not met,
> >> I guess it can't be enabled. Should those be "select"s instead?
> > No they should not be selected, people configuring their system need
> > to have the freedom of doing so. All those option are selected in all
> > the big distribution.
> As far as I can tell, the arch/x86/Kconfig doesn't select ARCH_HAS_HMM. 
> Its default is "y", so it should be enabled on anything that meets the 
> dependencies. But ZONE_DEVICE was not enabled by default. I think that's 
> what broke our kernel configs.
> 
> We'll fix our own kernel configs to enable ZONE_DEVICE and ARCH_HAS_HMM 
> to get our internal builds to work again.

You seem to be doing weird thing with your kconfig ...

> 
> I suspect other users with their own kernel configs will stumble over 
> this and wonder why KFD and userptr support are disabled in their builds.

Patch to improve kconfig are welcome but they should not force select
thing. Configuration is there to give user freedom to select fewature
they want to give up.

Maybe following would help:
ARCH_HAS_HMM
-   bool
-   default y
+   def_bool y

Cheers,
Jérôme
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 6:34 p.m., Jerome Glisse wrote:
> On Wed, Feb 20, 2019 at 10:39:49PM +, Kuehling, Felix wrote:
>> On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
>>> On Wed, Feb 20, 2019 at 07:18:17PM +, Kuehling, Felix wrote:
 [+Jerome]

 Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
 CPU page tables to device page tables.

 ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative
 for ARM support?

 Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the
 CPU architecture rather than any driver. Jerome, do you have any advice?
>>> This patch is wrong you need to depend on ARCH_HAS_HMM and
>> Who selects ARCH_HAS_HMM? Currently I don't see this selected anywhere.
>> So any config option that depends on it will be invisible in menuconfig.
>> Do we need ARCH_HAS_HMM somewhere in the arch/x86/Kconfig and
>> arch/powerpc/Kconfig?
>>
>> Also, ARCH_HAS_HMM does not currently support ARM. Does that mean we
>> can't have ARM support in AMDGPU if we start using HMM?
> ARCH_HAS_HMM is defined by architecture that support HMM. So par x86
> and PPC. It should not be hard to add it to ARM (i can not remember if
> ARM has DAX yet or not, if ARM does not have DAX then you need to add
> that first).

Not having ARM support is a bummer. I just enabled KFD on ARM a few 
weeks ago. Now depending on HMM makes KFD unusable on ARM. [+Mark FYI] I 
hope this is only a temporary setback.


>> Finally, ARCH_HAS_HMM has a bunch of dependencies. If they are not met,
>> I guess it can't be enabled. Should those be "select"s instead?
> No they should not be selected, people configuring their system need
> to have the freedom of doing so. All those option are selected in all
> the big distribution.
As far as I can tell, the arch/x86/Kconfig doesn't select ARCH_HAS_HMM. 
Its default is "y", so it should be enabled on anything that meets the 
dependencies. But ZONE_DEVICE was not enabled by default. I think that's 
what broke our kernel configs.

We'll fix our own kernel configs to enable ZONE_DEVICE and ARCH_HAS_HMM 
to get our internal builds to work again.

I suspect other users with their own kernel configs will stumble over 
this and wonder why KFD and userptr support are disabled in their builds.

Regards,
   Felix


>
>> config ARCH_HAS_HMM
>>   bool
>>   default y
>>   depends on (X86_64 || PPC64)
>>   depends on ZONE_DEVICE
>>   depends on MMU && 64BIT
>>   depends on MEMORY_HOTPLUG
>>   depends on MEMORY_HOTREMOVE
>>   depends on SPARSEMEM_VMEMMAP
>>
> Cheers,
> Jérôme
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Jerome Glisse
On Wed, Feb 20, 2019 at 10:39:49PM +, Kuehling, Felix wrote:
> On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
> > On Wed, Feb 20, 2019 at 07:18:17PM +, Kuehling, Felix wrote:
> >> [+Jerome]
> >>
> >> Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
> >> CPU page tables to device page tables.
> >>
> >> ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative
> >> for ARM support?
> >>
> >> Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the
> >> CPU architecture rather than any driver. Jerome, do you have any advice?
> > This patch is wrong you need to depend on ARCH_HAS_HMM and
> 
> Who selects ARCH_HAS_HMM? Currently I don't see this selected anywhere. 
> So any config option that depends on it will be invisible in menuconfig. 
> Do we need ARCH_HAS_HMM somewhere in the arch/x86/Kconfig and 
> arch/powerpc/Kconfig?
> 
> Also, ARCH_HAS_HMM does not currently support ARM. Does that mean we 
> can't have ARM support in AMDGPU if we start using HMM?

ARCH_HAS_HMM is defined by architecture that support HMM. So par x86
and PPC. It should not be hard to add it to ARM (i can not remember if
ARM has DAX yet or not, if ARM does not have DAX then you need to add
that first).

> 
> Finally, ARCH_HAS_HMM has a bunch of dependencies. If they are not met, 
> I guess it can't be enabled. Should those be "select"s instead?

No they should not be selected, people configuring their system need
to have the freedom of doing so. All those option are selected in all
the big distribution.

> config ARCH_HAS_HMM
>  bool
>  default y
>  depends on (X86_64 || PPC64)
>  depends on ZONE_DEVICE
>  depends on MMU && 64BIT
>  depends on MEMORY_HOTPLUG
>  depends on MEMORY_HOTREMOVE
>  depends on SPARSEMEM_VMEMMAP
> 

Cheers,
Jérôme
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[pull] radeon and amdgpu drm-fixes-5.0

2019-02-20 Thread Alex Deucher
Hi Dave, Daniel,

A bit bigger than normal for this week due to fixes for some long
standing display issues that are bound for stable.  These changes would
be going to stable anyway, so I figured it was better via 5.0 than 5.1.
- Several display fixes
- Fix PX systems due to core changes in runtime pm
- Disable bulk moves.  They are fixed in 5.1, but fix is too invasive for 5.0

The following changes since commit a3b22b9f11d9fbc48b0291ea92259a5a810e9438:

  Linux 5.0-rc7 (2019-02-17 18:46:40 -0800)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-fixes-5.0

for you to fetch changes up to a213c2c7e235cfc0e0a161a558f7fdf2fb3a624a:

  drm/amdgpu: disable bulk moves for now (2019-02-20 17:13:27 -0500)


Alex Deucher (1):
  drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

Bhawanpreet Lakha (2):
  drm/amd/display: fix optimize_bandwidth func pointer for dce80
  drm/amd/display: set clocks to 0 on suspend on dce80

Christian König (1):
  drm/amdgpu: disable bulk moves for now

Leo (Hanghong) Ma (1):
  drm/amd/display: Fix MST reboot/poweroff sequence

Nicholas Kazlauskas (1):
  drm/amd/display: Fix negative cursor pos programming

Rafael J. Wysocki (1):
  gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

Roman Li (1):
  drm/amd/display: Raise dispclk value for dce11

shaoyunl (1):
  drm/amdgpu: Update sdma golden setting for vega20

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  2 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c|  4 ++--
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  5 +++--
 drivers/gpu/drm/amd/display/dc/dce/dce_clk_mgr.c  | 11 ---
 .../drm/amd/display/dc/dce100/dce100_hw_sequencer.h   |  4 
 .../gpu/drm/amd/display/dc/dce80/dce80_hw_sequencer.c |  2 +-
 drivers/gpu/drm/amd/display/dc/dce80/dce80_resource.c | 19 ---
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c |  4 ++--
 drivers/gpu/drm/radeon/radeon_kms.c   |  1 +
 10 files changed, 40 insertions(+), 13 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
On 2019-02-20 5:12 p.m., Jerome Glisse wrote:
> On Wed, Feb 20, 2019 at 07:18:17PM +, Kuehling, Felix wrote:
>> [+Jerome]
>>
>> Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring
>> CPU page tables to device page tables.
>>
>> ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative
>> for ARM support?
>>
>> Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the
>> CPU architecture rather than any driver. Jerome, do you have any advice?
> This patch is wrong you need to depend on ARCH_HAS_HMM and

Who selects ARCH_HAS_HMM? Currently I don't see this selected anywhere. 
So any config option that depends on it will be invisible in menuconfig. 
Do we need ARCH_HAS_HMM somewhere in the arch/x86/Kconfig and 
arch/powerpc/Kconfig?

Also, ARCH_HAS_HMM does not currently support ARM. Does that mean we 
can't have ARM support in AMDGPU if we start using HMM?

Finally, ARCH_HAS_HMM has a bunch of dependencies. If they are not met, 
I guess it can't be enabled. Should those be "select"s instead?

config ARCH_HAS_HMM
 bool
 default y
 depends on (X86_64 || PPC64)
 depends on ZONE_DEVICE
 depends on MMU && 64BIT
 depends on MEMORY_HOTPLUG
 depends on MEMORY_HOTREMOVE
 depends on SPARSEMEM_VMEMMAP

Regards,
   Felix

> select HMM_MIRROR you do not need to select ZONE_DEVICE
>
> So it should look like:
>
> config DRM_AMDGPU_USERPTR
>   bool "Always enable userptr write support"
>   depends on DRM_AMDGPU
>   depends on ARCH_HAS_HMM
>   select HMM_MIRROR
>   help
> This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
> isn't already selected to enabled full userptr support.
>
> I have not got around to work on amdgpu on that respect yet
> but it is on my todo list unless someone else beat me to it :)
>
> Cheers,
> Jérôme
>
>> Thanks,
>>     Felix
>>
>> On 2019-02-20 1:56 p.m., Yang, Philip wrote:
>>> Those options are needed to support HMM
>>>
>>> Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
>>> Signed-off-by: Philip Yang 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
>>>1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
>>> b/drivers/gpu/drm/amd/amdgpu/Kconfig
>>> index 960a63355705..63f0542bc34b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
>>> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
>>> @@ -26,7 +26,9 @@ config DRM_AMDGPU_CIK
>>>config DRM_AMDGPU_USERPTR
>>> bool "Always enable userptr write support"
>>> depends on DRM_AMDGPU
>>> +   select ARCH_HAS_HMM
>>> select HMM_MIRROR
>>> +   select ZONE_DEVICE
>>> help
>>>   This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
>>>   isn't already selected to enabled full userptr support.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Jerome Glisse
On Wed, Feb 20, 2019 at 07:18:17PM +, Kuehling, Felix wrote:
> [+Jerome]
> 
> Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring 
> CPU page tables to device page tables.
> 
> ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative 
> for ARM support?
> 
> Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the 
> CPU architecture rather than any driver. Jerome, do you have any advice?

This patch is wrong you need to depend on ARCH_HAS_HMM and
select HMM_MIRROR you do not need to select ZONE_DEVICE

So it should look like:

config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
depends on ARCH_HAS_HMM
select HMM_MIRROR
help
  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
  isn't already selected to enabled full userptr support.

I have not got around to work on amdgpu on that respect yet
but it is on my todo list unless someone else beat me to it :)

Cheers,
Jérôme

> 
> Thanks,
>    Felix
> 
> On 2019-02-20 1:56 p.m., Yang, Philip wrote:
> > Those options are needed to support HMM
> >
> > Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
> > Signed-off-by: Philip Yang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> > b/drivers/gpu/drm/amd/amdgpu/Kconfig
> > index 960a63355705..63f0542bc34b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> > @@ -26,7 +26,9 @@ config DRM_AMDGPU_CIK
> >   config DRM_AMDGPU_USERPTR
> > bool "Always enable userptr write support"
> > depends on DRM_AMDGPU
> > +   select ARCH_HAS_HMM
> > select HMM_MIRROR
> > +   select ZONE_DEVICE
> > help
> >   This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
> >   isn't already selected to enabled full userptr support.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: s2idle not working

2019-02-20 Thread Alex Deucher
On Wed, Feb 20, 2019 at 4:59 PM shahul hameed
 wrote:
>
> Hi
>
> I am facing below issue. Can any one please help me to resolve it.

As I mentioned previously, please make sure the platform supports
s2idle and that it is enabled in the sbios.  Does s2idle work on other
devices and the platform itself?

Alex

>
> Regards,
> Sk shahul
>
> -- Forwarded message -
> From: shahul hameed 
> Date: Wed, Feb 20, 2019, 3:13 PM
> Subject: s2idle not working
> To: 
>
>
> Hi Shirish
>
> I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card is 
> gfx9.
> Porting is done successfully. Now i am working on suspend/resume.
> Suspend/resume is working fine in deep mode.
> But suspend/resume is not working in s2idle (Suspend to idle) mode.
>
> during resume i found error in gpu driver.
>
> [  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed 
> (scratch(0xC040)=0xCAFEDEAD)
> [  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume 
> of IP block  failed -22
> [  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR* 
> amdgpu_device_ip_resume failed (-22).
> [  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
> [  105.862345] PM: Device :03:00.0 failed to resume async: error -22
>
> same issue is reproduced with ubuntu16.04.
>
> Can you please help me to resolve this issue.
> Thanks in Advance,
> Regards,
> Sk shahul.
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Fwd: s2idle not working

2019-02-20 Thread shahul hameed
Hi

I am facing below issue. Can any one please help me to resolve it.

Regards,
Sk shahul

-- Forwarded message -
From: shahul hameed 
Date: Wed, Feb 20, 2019, 3:13 PM
Subject: s2idle not working
To: 


Hi Shirish

I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card is
gfx9.
Porting is done successfully. Now i am working on suspend/resume.
Suspend/resume is working fine in deep mode.
But suspend/resume is not working in s2idle (Suspend to idle) mode.

during resume i found error in gpu driver.

[  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed
(scratch(0xC040)=0xCAFEDEAD)
[  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume
of IP block  failed -22
[  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-22).
[  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
[  105.862345] PM: Device :03:00.0 failed to resume async: error -22

same issue is reproduced with ubuntu16.04.

Can you please help me to resolve this issue.
Thanks in Advance,
Regards,
Sk shahul.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: s2idle not working

2019-02-20 Thread shahul hameed
Thanks, I will.

On Wed, Feb 20, 2019, 8:29 PM S, Shirish  wrote:

> Please mail your query to amd-gfx@lists.freedesktop.org .
>
>
>
>
>
>
>
> Regards,
>
> Shirish S
>
>
>
> *From:* shahul hameed 
> *Sent:* Wednesday, February 20, 2019 3:14 PM
> *To:* S, Shirish 
> *Subject:* s2idle not working
>
>
>
> Hi Shirish
>
>
>
> I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card
> is gfx9.
>
> Porting is done successfully. Now i am working on suspend/resume.
>
> Suspend/resume is working fine in deep mode.
>
> But suspend/resume is not working in s2idle (Suspend to idle) mode.
>
>
>
> during resume i found error in gpu driver.
>
>
>
> [  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed
> (scratch(0xC040)=0xCAFEDEAD)
> [  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
> resume of IP block  failed -22
> [  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR*
> amdgpu_device_ip_resume failed (-22).
> [  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
> [  105.862345] PM: Device :03:00.0 failed to resume async: error -22
>
>
>
> same issue is reproduced with ubuntu16.04.
>
>
>
> Can you please help me to resolve this issue.
>
> Thanks in Advance,
>
> Regards,
>
> Sk shahul.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Kuehling, Felix

On 2019-02-20 1:41 a.m., Thomas Hellstrom wrote:
> On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:
>> On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
>>> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
 Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
> On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
>> Another good question is also why the heck the acc_size
>> counts
>> towards
>> the DMA32 zone?
> DMA32 TTM pages are accounted in the DMA32 zone. Other pages
> are
> not.
 Yeah, I'm perfectly aware of this. But this is for the accounting
 size!

 We have an accounting for the stuff needed additional to the
 pages
 backing the BO (e.g. the page and DMA addr array).

 And from the bug description it sounds like we use the DMA32 zone
 for
 this accounting which of course is completely nonsense.
>>> It's actually accounted in all available zones, since it would be
>>> pretty hard to determine exactly where that memory should be
>>> accounted.
>>> In particular if it's vmalloced. It might be DMA32, it might not.
>>> Given
>>> the objective of stopping malicious user-space from exhausting the
>>> DMA32 zone it was, at the time the code was written, a reasonable
>>> approximation. With ever increasing memory sizes, there might be
>>> better
>>> solutions?
>> As far as I can see, in TTM, ttm_mem_global_alloc is only used for
>> the
>> acc_size in ttm_bo_init_reserved. Other than that vmwgfx also seems
>> to
>> use it to account for a few things that are allocated with kmalloc.
>>
>> So would a better solution be to change ttm_mem_global_alloc to use
>> only
>> the kernel zone?
>>
> IMO we need to determine what functionality to keep and then the best
> solution. The current code does its job, but is obviously too
> restrictive. Both of the solutions you suggest open up for potential
> DOS attacks (DMA32 and kernel zones are not mutually exclusive. They
> overlap).
On x86 with HIGHMEM there is no dma32 zone. Why do we need one on 
x86_64? Can we make x86_64 more like HIGHMEM instead?

Regards,
   Felix


>
>
> /Thomas
>
>
>
>
>> Regards,
>> Felix
>>
>>
>>> /Thomas
>>>
 Christian.

> For small persistent allocations using ttm_mem_global_alloc(),
> they
> are
> accounted also in the DMA32 zone, which may cause over-
> accounting
> of
> that zone, but that's pretty unlikely to be a big problem..
>
> /Thomas
>
>
>
>
>
>> In other words why should the internal bookkeeping pages be
>> allocated
>> in
>> the DMA32 zone?
>>
>> That doesn't sounds valid to me in any way,
>> Christian.
>>
>> Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
>>> Hmm,
>>>
>>> This zone was intended to stop TTM page allocations from
>>> exhausting
>>> the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
>>> default,
>>> which
>>> means if we drop this check, other devices may stop
>>> functioning
>>> unexpectedly?
>>>
>>> However, in the end I'd expect the kernel page allocation
>>> system
>>> to
>>> make sure there are some pages left in the DMA32 zone,
>>> otherwise
>>> random non-IO page allocations would also potentially
>>> exhaust
>>> the
>>> DMA32 zone without anybody caring, which means removing
>>> this
>>> zone
>>> wouldn't be any worse than whatever other subsystems may be
>>> doing
>>> already...
>>>
>>> /Thomas
>>>
>>>
>>> On 2/16/19 12:02 AM, Kuehling, Felix wrote:
 This is an RFC. I'm not sure this is the right solution,
 but
 it
 highlights the problem I'm trying to solve.

 The dma32_zone limits the acc_size of all allocated BOs
 to
 2GB.
 On a
 64-bit system with hundreds of GB of system memory and
 GPU
 memory,
 this can become a bottle neck. We're seeing TTM memory
 allocation
 failures not because we're truly out of memory, but
 because
 we're
 out of space in the dma32_zone for the acc_size needed
 for
 our BO
 book-keeping.

 Signed-off-by: Felix Kuehling 
 CC: thellst...@vmware.com
 CC: christian.koe...@amd.com
 ---
  drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
 b/drivers/gpu/drm/ttm/ttm_memory.c
 index f1567c3..bb05365 100644
 --- a/drivers/gpu/drm/ttm/ttm_memory.c
 +++ b/drivers/gpu/drm/ttm/ttm_memory.c
 @@ -363,7 +363,7 @@ static int
 ttm_mem_init_highmem_zone(struct
 ttm_mem_global *glob,
  glob->zones[glob->num_zones++] = zone;
  return 0;
  }
 

Re: [PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Kuehling, Felix
[+Jerome]

Why to we need ZONE_DEVICE. I didn't think this was needed for mirroring 
CPU page tables to device page tables.

ARCH_HAS_HMM depends on (X86_64 || PPC64). Do we have some alternative 
for ARM support?

Also, the name ARCH_HAS_HMM looks like it's meant to be selected by the 
CPU architecture rather than any driver. Jerome, do you have any advice?

Thanks,
   Felix

On 2019-02-20 1:56 p.m., Yang, Philip wrote:
> Those options are needed to support HMM
>
> Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 960a63355705..63f0542bc34b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -26,7 +26,9 @@ config DRM_AMDGPU_CIK
>   config DRM_AMDGPU_USERPTR
>   bool "Always enable userptr write support"
>   depends on DRM_AMDGPU
> + select ARCH_HAS_HMM
>   select HMM_MIRROR
> + select ZONE_DEVICE
>   help
> This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
> isn't already selected to enabled full userptr support.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: select ARCH_HAS_HMM and ZONE_DEVICE option

2019-02-20 Thread Yang, Philip
Those options are needed to support HMM

Change-Id: Ieb7bb3bcec07245d79a02793e6728228decc400a
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 960a63355705..63f0542bc34b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,7 +26,9 @@ config DRM_AMDGPU_CIK
 config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
+   select ARCH_HAS_HMM
select HMM_MIRROR
+   select ZONE_DEVICE
help
  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
  isn't already selected to enabled full userptr support.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable userptr if swiotlb is active

2019-02-20 Thread Christian König
Well in the worst case you could end up having a bounce buffer mapped 
instead of the real page.


Christian.

Am 20.02.19 um 18:02 schrieb Kuehling, Felix:

I guess we'll need something similar for KFD? I don't think we've ever
intentionally tested KFD with swiotlb. But I've seen some backtraces
with swiotlb in them before. I wonder how badly broken it is ...

Regards,
    Felix

On 2019-02-20 8:46 a.m., Christian König wrote:

Otherwise we can't be sure that we won't end up with a bounce buffer.

Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d21dd2f369da..abc65633119b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -289,6 +289,10 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
if (offset_in_page(args->addr | args->size))
return -EINVAL;
   
+	/* We can't do this when swiotlb is active */

+   if (adev->needs_swiotlb)
+   return -ENXIO;
+
/* reject unknown flag values */
if (args->flags & ~(AMDGPU_GEM_USERPTR_READONLY |
AMDGPU_GEM_USERPTR_ANONONLY | AMDGPU_GEM_USERPTR_VALIDATE |


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-20 Thread Mikhail Gavrilov
On Wed, 20 Feb 2019 at 20:39, Grodzovsky, Andrey
 wrote:
> No, we only fixed the original deadlock with display driver during GPU
> reset. I still didn't have time to go over your captures for the GPU
> page fault.
>
> The deadlock we see here is another deadlock, different from the one
> already fixed. I suggest you open a bugzilla ticket for this and add me
> there so we can track it and take care of it.
>
> Andrey
>

Ok, here I filled bug report about new deadlock:
https://bugs.freedesktop.org/show_bug.cgi?id=109692

--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable userptr if swiotlb is active

2019-02-20 Thread Kuehling, Felix
I guess we'll need something similar for KFD? I don't think we've ever 
intentionally tested KFD with swiotlb. But I've seen some backtraces 
with swiotlb in them before. I wonder how badly broken it is ...

Regards,
   Felix

On 2019-02-20 8:46 a.m., Christian König wrote:
> Otherwise we can't be sure that we won't end up with a bounce buffer.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index d21dd2f369da..abc65633119b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -289,6 +289,10 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, 
> void *data,
>   if (offset_in_page(args->addr | args->size))
>   return -EINVAL;
>   
> + /* We can't do this when swiotlb is active */
> + if (adev->needs_swiotlb)
> + return -ENXIO;
> +
>   /* reject unknown flag values */
>   if (args->flags & ~(AMDGPU_GEM_USERPTR_READONLY |
>   AMDGPU_GEM_USERPTR_ANONONLY | AMDGPU_GEM_USERPTR_VALIDATE |
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

2019-02-20 Thread Michael Labriola
On Wed, Feb 20, 2019 at 11:00 AM Christian König
 wrote:
>
> Am 20.02.19 um 16:16 schrieb Michael Labriola:
> > On Wed, Feb 20, 2019 at 9:30 AM Deucher, Alexander
> >  wrote:
> >> Reviewed-by: Alex Deucher 
> >> 
> >> From: amd-gfx  on behalf of 
> >> Christian König 
> >> Sent: Wednesday, February 20, 2019 7:47 AM
> >> To: michael.d.labri...@gmail.com; amd-gfx@lists.freedesktop.org
> >> Subject: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c
> >>
> >> This got messed up by "drm: change func to better detect wether swiotlb
> >> is needed".
> > Shoot, I thought I hunted all those down!  Sorry about that.
>
> Can I get an rb or ab so that I can push it directly after your change?

Reviewed-by: Michael D Labriola 

>
> BTW: I pushed your change to drm-misc-next and its on it way to upstream.

Great, thanks!

Is this something that should get fed into the stable branches?

-Mike

> Christian.
>
> >
> > -Mike
> >
> >> Signed-off-by: Christian König 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> index 9fc3296592fe..98fd9208877f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> @@ -886,7 +886,7 @@ static int gmc_v6_0_sw_init(void *handle)
> >>   pci_set_consistent_dma_mask(adev->pdev, 
> >> DMA_BIT_MASK(32));
> >>   dev_warn(adev->dev, "amdgpu: No coherent DMA 
> >> available.\n");
> >>   }
> >> -   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> >> +   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
> >>
> >>   r = gmc_v6_0_init_microcode(adev);
> >>   if (r) {
> >> --
> >> 2.17.1
> >>
> >> ___
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


-- 
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
401-848-8871 (work)
401-234-1306 (home)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

2019-02-20 Thread Christian König

Am 20.02.19 um 16:16 schrieb Michael Labriola:

On Wed, Feb 20, 2019 at 9:30 AM Deucher, Alexander
 wrote:

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Christian König 

Sent: Wednesday, February 20, 2019 7:47 AM
To: michael.d.labri...@gmail.com; amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

This got messed up by "drm: change func to better detect wether swiotlb
is needed".

Shoot, I thought I hunted all those down!  Sorry about that.


Can I get an rb or ab so that I can push it directly after your change?

BTW: I pushed your change to drm-misc-next and its on it way to upstream.

Christian.



-Mike


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 9fc3296592fe..98fd9208877f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -886,7 +886,7 @@ static int gmc_v6_0_sw_init(void *handle)
  pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
  dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
  }
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);

  r = gmc_v6_0_init_microcode(adev);
  if (r) {
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

2019-02-20 Thread Grodzovsky, Andrey

On 2/20/19 12:28 AM, Mikhail Gavrilov wrote:
> On Tue, 19 Feb 2019 at 20:24, Grodzovsky, Andrey
>  wrote:
>> Just pull in latest drm-next from here -
>> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
>>
>> Andrey
> Tested this kernel and result not good for me.
> 1) "amdgpu :0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C"
> happens again. I thought this would fixed.

No, we only fixed the original deadlock with display driver during GPU 
reset. I still didn't have time to go over your captures for the GPU 
page fault.

The deadlock we see here is another deadlock, different from the one 
already fixed. I suggest you open a bugzilla ticket for this and add me 
there so we can track it and take care of it.

Andrey


>
> 2) After it "WARNING: possible circular locking dependency detected" happens.
>
> [  302.266337] ==
> [  302.266338] WARNING: possible circular locking dependency detected
> [  302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G C
> [  302.266341] --
> [  302.266343] kworker/5:2/871 is trying to acquire lock:
> [  302.266345] 0abbb16a
> (&(>fence_drv.lock)->rlock){-.-.}, at:
> dma_fence_remove_callback+0x1a/0x60
> [  302.266352]
> but task is already holding lock:
> [  302.266353] 6e32ba38
> (&(>job_list_lock)->rlock){-.-.}, at: drm_sched_stop+0x34/0x140
> [gpu_sched]
> [  302.266358]
> which lock already depends on the new lock.
>
> [  302.266360]
> the existing dependency chain (in reverse order) is:
> [  302.266361]
> -> #1 (&(>job_list_lock)->rlock){-.-.}:
> [  302.266366]drm_sched_process_job+0x4d/0x180 [gpu_sched]
> [  302.266368]dma_fence_signal+0x111/0x1a0
> [  302.266414]amdgpu_fence_process+0xa3/0x100 [amdgpu]
> [  302.266470]sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
> [  302.266523]amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
> [  302.266576]amdgpu_ih_process+0x84/0xf0 [amdgpu]
> [  302.266628]amdgpu_irq_handler+0x1b/0x50 [amdgpu]
> [  302.266632]__handle_irq_event_percpu+0x3f/0x290
> [  302.266635]handle_irq_event_percpu+0x31/0x80
> [  302.266637]handle_irq_event+0x34/0x51
> [  302.266639]handle_edge_irq+0x7c/0x1a0
> [  302.266643]handle_irq+0xbf/0x100
> [  302.266646]do_IRQ+0x61/0x120
> [  302.266648]ret_from_intr+0x0/0x22
> [  302.266651]cpuidle_enter_state+0xbf/0x470
> [  302.266654]do_idle+0x1ec/0x280
> [  302.266657]cpu_startup_entry+0x19/0x20
> [  302.20]start_secondary+0x1b3/0x200
> [  302.23]secondary_startup_64+0xa4/0xb0
> [  302.24]
> -> #0 (&(>fence_drv.lock)->rlock){-.-.}:
> [  302.28]_raw_spin_lock_irqsave+0x49/0x83
> [  302.266670]dma_fence_remove_callback+0x1a/0x60
> [  302.266673]drm_sched_stop+0x59/0x140 [gpu_sched]
> [  302.266717]amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
> [  302.266761]amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
> [  302.266822]amdgpu_job_timedout+0x109/0x130 [amdgpu]
> [  302.266827]drm_sched_job_timedout+0x40/0x70 [gpu_sched]
> [  302.266831]process_one_work+0x272/0x5d0
> [  302.266834]worker_thread+0x50/0x3b0
> [  302.266836]kthread+0x108/0x140
> [  302.266839]ret_from_fork+0x27/0x50
> [  302.266840]
> other info that might help us debug this:
>
> [  302.266841]  Possible unsafe locking scenario:
>
> [  302.266842]CPU0CPU1
> [  302.266843]
> [  302.266844]   lock(&(>job_list_lock)->rlock);
> [  302.266846]
> lock(&(>fence_drv.lock)->rlock);
> [  302.266847]
> lock(&(>job_list_lock)->rlock);
> [  302.266849]   lock(&(>fence_drv.lock)->rlock);
> [  302.266850]
>  *** DEADLOCK ***
>
> [  302.266852] 5 locks held by kworker/5:2/871:
> [  302.266853]  #0: d133fb6e ((wq_completion)"events"){+.+.},
> at: process_one_work+0x1e9/0x5d0
> [  302.266857]  #1: 8a5c3f7e
> ((work_completion)(&(>work_tdr)->work)){+.+.}, at:
> process_one_work+0x1e9/0x5d0
> [  302.266862]  #2: b9b2c76f (>lock_reset){+.+.}, at:
> amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
> [  302.266908]  #3: ac637728 (>lock_hidden){+.+.}, at:
> kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
> [  302.266965]  #4: 6e32ba38
> (&(>job_list_lock)->rlock){-.-.}, at: drm_sched_stop+0x34/0x140
> [gpu_sched]
> [  302.266971]
> stack backtrace:
> [  302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G C
>5.0.0-rc1-drm-next-kernel+ #1
> [  302.266976] Hardware name: System manufacturer System Product
> Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
> [  302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
> [  302.266982] Call 

Re: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

2019-02-20 Thread Michael Labriola
On Wed, Feb 20, 2019 at 9:30 AM Deucher, Alexander
 wrote:
>
> Reviewed-by: Alex Deucher 
> 
> From: amd-gfx  on behalf of Christian 
> König 
> Sent: Wednesday, February 20, 2019 7:47 AM
> To: michael.d.labri...@gmail.com; amd-gfx@lists.freedesktop.org
> Subject: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c
>
> This got messed up by "drm: change func to better detect wether swiotlb
> is needed".

Shoot, I thought I hunted all those down!  Sorry about that.

-Mike

>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index 9fc3296592fe..98fd9208877f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -886,7 +886,7 @@ static int gmc_v6_0_sw_init(void *handle)
>  pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>  dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
>  }
> -   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> +   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
>  r = gmc_v6_0_init_microcode(adev);
>  if (r) {
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-- 
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
401-848-8871 (work)
401-234-1306 (home)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: s2idle not working

2019-02-20 Thread S, Shirish
Please mail your query to 
amd-gfx@lists.freedesktop.org .



Regards,
Shirish S

From: shahul hameed 
Sent: Wednesday, February 20, 2019 3:14 PM
To: S, Shirish 
Subject: s2idle not working

Hi Shirish

I am porting Android_N and keenel 4.19.2 to Ryzen platform. Graphic card is 
gfx9.
Porting is done successfully. Now i am working on suspend/resume.
Suspend/resume is working fine in deep mode.
But suspend/resume is not working in s2idle (Suspend to idle) mode.

during resume i found error in gpu driver.

[  105.862161] [drm:gfx_v9_0_hw_init [amdgpu]] *ERROR* KCQ enable failed 
(scratch(0xC040)=0xCAFEDEAD)
[  105.862187] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of 
IP block  failed -22
[  105.862210] [drm:amdgpu_device_resume [amdgpu]] *ERROR* 
amdgpu_device_ip_resume failed (-22).
[  105.862215] dpm_run_callback(): pci_pm_resume+0x0/0xd6 returns -22
[  105.862345] PM: Device :03:00.0 failed to resume async: error -22

same issue is reproduced with ubuntu16.04.

Can you please help me to resolve this issue.
Thanks in Advance,
Regards,
Sk shahul.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable userptr if swiotlb is active

2019-02-20 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Wednesday, February 20, 2019 8:46 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: disable userptr if swiotlb is active

Otherwise we can't be sure that we won't end up with a bounce buffer.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d21dd2f369da..abc65633119b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -289,6 +289,10 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 if (offset_in_page(args->addr | args->size))
 return -EINVAL;

+   /* We can't do this when swiotlb is active */
+   if (adev->needs_swiotlb)
+   return -ENXIO;
+
 /* reject unknown flag values */
 if (args->flags & ~(AMDGPU_GEM_USERPTR_READONLY |
 AMDGPU_GEM_USERPTR_ANONONLY | AMDGPU_GEM_USERPTR_VALIDATE |
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable bulk moves for now

2019-02-20 Thread Deucher, Alexander
Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Wednesday, February 20, 2019 9:16 AM
To: amd-gfx@lists.freedesktop.org
Cc: sta...@vger.kernel.org
Subject: [PATCH] drm/amdgpu: disable bulk moves for now

The changes to fix those are two invasive for backporting.

Just disable the feature in 4.20 and 5.0.

Signed-off-by: Christian König 
Cc: [4.20+]
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0877ff9a9594..c3643dde3e9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -637,12 +637,14 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device 
*adev,
 struct ttm_bo_global *glob = adev->mman.bdev.glob;
 struct amdgpu_vm_bo_base *bo_base;

+#if 0
 if (vm->bulk_moveable) {
 spin_lock(>lru_lock);
 ttm_bo_bulk_move_lru_tail(>lru_bulk_move);
 spin_unlock(>lru_lock);
 return;
 }
+#endif

 memset(>lru_bulk_move, 0, sizeof(vm->lru_bulk_move));

--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 000/138] The new SW SMU driver of amdgpu

2019-02-20 Thread Deucher, Alexander
Series is:
Acked-by: Alex Deucher 

From: Huang, Ray
Sent: Wednesday, February 20, 2019 7:51 AM
To: Huang, Ray; amd-gfx@lists.freedesktop.org; Deucher, Alexander
Cc: Gao, Likun; Wang, Kevin(Yang); Gui, Jack
Subject: RE: [PATCH 000/138] The new SW SMU driver of amdgpu

Ping.
May I have your comments?

Thanks,
Ray

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Huang Rui
> Sent: Friday, January 25, 2019 6:23 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Gao, Likun ; Wang, Kevin(Yang)
> ; Huang, Ray ; Gui, Jack
> 
> Subject: [PATCH 000/138] The new SW SMU driver of amdgpu
>
> Hi all,
>
> The series of patches are to implement a new SW SMU driver for future asics.
>
> Background:
> The powerplay driver will be retired. The final version is for vega20 with
> SMU11. However, the future asic will use the new swSMU framework to
> implement as
> well. Here is the first version of new sw smu driver that is basing on vega20.
>
> Purpose:
> We would like to do re-arch for linux power codes to use a new sw SMU ip
> block
> for future asics. We hope to write a simple and readable framework for Linux.
>
> Currently, the default path is still with powerplay on vega20. So far we don’t
> plan to switch default path to new swSMU design for vega20. And we can
> use the
> module parameter amdgpu_dpm to switch it to new SW SMU design
> (modprobe amdgpu
> dpm=1).
>
> Development Items:
> - Setup new SMU IP block skeleton.
> - Implement SMC firmware loading function.
> - Implement SMC table data structure.
> - Implement SMU v11 indirect register (MP1) read/write and SMC message
> sending
>   helpers.
> - Implement SMU v11 SMC table initialization (read from vbios, parse,
> populate,
>   and write back to smc).
> - Implement SMU v11 memory pool location function.
> - Enable DPM feature for SMU v11 and implement DPM control callback
> function.
> - Enable interfaces for starting tools.
> - Implement SMU v11 power control and power containment functions.
> - Implement and enable SMU v11 thermal/fan control function.
> - Implement SMU interfaces placeholder for DC, VCN, and KFD driver.
> - Enable and align sys interface in the amdgpu_pm.c.
>
> So far, Kevin, Likun, Jack, and I have enabled features such as dpm, od,
> thermal, and etc. with new sw smu driver. (Thanks to Kevin, Likun and Jack's
> great effort!)
>
> Any comments are warm for us.
>
> Thanks & Best Regards!
> Ray
>
>
> Chengming Gui (14):
>   drm/amd/powerplay: implement power_dpm_state sys interface for
> SMU11
>   drm/amd/powerplay: add watermarks related data structs and function
> for SMU11.
>   drm/amd/powerplay: implement pp_power_profile_mode sys inerface for
> SMU11
>   drm/amd/powerplay: add display_config to handle display config for
> SMU11.
>   drm/amd/powerplay: add mclk_latency_table struct and smu_clocks struct
> for SMU11
>   drm/amd/powerplay: add enable_umd_pstate functions for SMU11
>   drm/amd/powerplay: add get_profiling_clk_mask functions for SMU11
>   drm/amd/powerplay: add set_uclk_to_highest_level for SMU11
>   drm/amd/powerplay: add display_config_changed for SMU11.
>   drm/amd/powerplay: add apply_clock_adjust_rules for SMU11.
>   drm/amd/powerplay: add vega20_notify_smc_display_config functions for
> SMU11
>   drm/amd/powerplay: add vega20_find/force_higest/lowest_dpm for
> SMU11
>   drm/amd/powerplay: add vega20_unforce_dpm_levels for SMU11.
>   drm/amd/powerplay: implement power_dpm_force_performance_level
> for
> SMU11
>
> Huang Rui (53):
>   drm/amd/powerplay: add new smu ip block
>   drm/amd/powerplay: add smu11 sub block for SMU IP
>   drm/amd/powerplay: add firmware loading interface
>   drm/amd/powerplay: add fw load checking interface
>   drm/amd/powerplay: add interface to read pptable from vbios
>   drm/amd/powerplay: add placeholder of smu_initialize_pptable
>   drm/amd/powerplay: add interface to init smc tables (v2)
>   drm/amd/powerplay: add interface to init power (v2)
>   drm/amd/powerplay: add interface to get vbios bootup values (v2)
>   drm/amd/powerplay: add interface to check pptable (v2)
>   drm/amd/powerplay: add interface to init fb allocations (v2)
>   drm/amd/powerplay: add interface to parse pptable (v2)
>   drm/amd/powerplay: add interface to populate smc pptable (v2)
>   drm/amd/powerplay: add interface to check fw version (v2)
>   drm/amd/powerplay: add interface to write pptable (v2)
>   drm/amd/powerplay: add interface to set min dcef deep sleep (v2)
>   drm/amd/powerplay: add interface to set tool table location (v2)
>   drm/amd/powerplay: add interface to allocate memory pool (v2)
>   drm/amd/powerplay: add interface to notify memory pool location (v2)
>   drm/amd/powerplay: add interfaces for smu resume
>   drm/amd/powerplay: add resume sequence placeholder for smu ip block
>   drm/amdgpu: enable new smu ip block for vega20
>   drm/amd/powerplay: add new ppsmc header for smu11 (v2)
>   

Re: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

2019-02-20 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Wednesday, February 20, 2019 7:47 AM
To: michael.d.labri...@gmail.com; amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

This got messed up by "drm: change func to better detect wether swiotlb
is needed".

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 9fc3296592fe..98fd9208877f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -886,7 +886,7 @@ static int gmc_v6_0_sw_init(void *handle)
 pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
 dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
 }
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);

 r = gmc_v6_0_init_microcode(adev);
 if (r) {
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: disable bulk moves for now

2019-02-20 Thread Christian König
The changes to fix those are two invasive for backporting.

Just disable the feature in 4.20 and 5.0.

Signed-off-by: Christian König 
Cc: [4.20+]
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0877ff9a9594..c3643dde3e9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -637,12 +637,14 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device 
*adev,
struct ttm_bo_global *glob = adev->mman.bdev.glob;
struct amdgpu_vm_bo_base *bo_base;
 
+#if 0
if (vm->bulk_moveable) {
spin_lock(>lru_lock);
ttm_bo_bulk_move_lru_tail(>lru_bulk_move);
spin_unlock(>lru_lock);
return;
}
+#endif
 
memset(>lru_bulk_move, 0, sizeof(vm->lru_bulk_move));
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: disable userptr if swiotlb is active

2019-02-20 Thread Christian König
Otherwise we can't be sure that we won't end up with a bounce buffer.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d21dd2f369da..abc65633119b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -289,6 +289,10 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
if (offset_in_page(args->addr | args->size))
return -EINVAL;
 
+   /* We can't do this when swiotlb is active */
+   if (adev->needs_swiotlb)
+   return -ENXIO;
+
/* reject unknown flag values */
if (args->flags & ~(AMDGPU_GEM_USERPTR_READONLY |
AMDGPU_GEM_USERPTR_ANONONLY | AMDGPU_GEM_USERPTR_VALIDATE |
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/powerplay: Get fix clock info when dpm is disabled for the clock

2019-02-20 Thread Russell, Kent
Eric's right. For now, we have a function that is a generic isDPMAvailable, 
that checks for the presence of the "dpm_state" sysfs file. In this case if we 
only have 1 clock available, it will just report that one clock for all 
outputs. So the SMI can handle this effectively. Thanks for checking though 
Felix, I definitely appreciate it.

 Kent

> -Original Message-
> From: Huang, JinHuiEric
> Sent: Tuesday, February 19, 2019 2:58 PM
> To: Kuehling, Felix ; amd-
> g...@lists.freedesktop.org; Russell, Kent 
> Subject: Re: [PATCH] drm/powerplay: Get fix clock info when dpm is disabled
> for the clock
> 
> "rocm-smi -s" doesn't parse output from driver and it directly display output,
> so it will not break rocm-smi.
> 
> Regards,
> 
> Eric
> 
> On 2019-02-19 2:51 p.m., Kuehling, Felix wrote:
> > [+Kent]
> >
> > On 2019-02-19 12:08 p.m., Huang, JinHuiEric wrote:
> >
> >> It seems redundant to print out all DPM levels when DPM is disabled.
> >> Probably it looks meaningful to print out "Mhz (DPM
> >> disabled)".
> > If you change the format, you risk breaking tools such as rocm-smi. Is
> > there some other way for rocm-smi to tell that DPM is disabled?
> >
> > Regards,
> >     Felix
> >
> >
> >> Regards,
> >>
> >> Eric
> >>
> >> On 2019-02-15 4:25 p.m., Liu, Shaoyun wrote:
> >>> When DPM for the specific clock is difabled, driver should still
> >>> able to get fix clock info from the pptable
> >>>
> >>> Change-Id: Ic609203b3b87aa75b0cfd57b57717b3bb89daf48
> >>> Signed-off-by: shaoyunl 
> >>> ---
> >>> drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 16 ---
> -
> >>> 1 file changed, 16 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> >>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> >>> index aad79aff..2eae0b4 100644
> >>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> >>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> >>> @@ -2641,10 +2641,6 @@ static int vega20_get_sclks(struct pp_hwmgr
> *hwmgr,
> >>>   struct vega20_single_dpm_table *dpm_table = &(data-
> >dpm_table.gfx_table);
> >>>   int i, count;
> >>>
> >>> - PP_ASSERT_WITH_CODE(data-
> >smu_features[GNLD_DPM_GFXCLK].enabled,
> >>> - "[GetSclks]: gfxclk dpm not enabled!\n",
> >>> - return -EPERM);
> >>> -
> >>>   count = (dpm_table->count > MAX_NUM_CLOCKS) ?
> MAX_NUM_CLOCKS : dpm_table->count;
> >>>   clocks->num_levels = count;
> >>>
> >>> @@ -2670,10 +2666,6 @@ static int vega20_get_memclocks(struct
> pp_hwmgr *hwmgr,
> >>>   struct vega20_single_dpm_table *dpm_table = &(data-
> >dpm_table.mem_table);
> >>>   int i, count;
> >>>
> >>> - PP_ASSERT_WITH_CODE(data-
> >smu_features[GNLD_DPM_UCLK].enabled,
> >>> - "[GetMclks]: uclk dpm not enabled!\n",
> >>> - return -EPERM);
> >>> -
> >>>   count = (dpm_table->count > MAX_NUM_CLOCKS) ?
> MAX_NUM_CLOCKS : dpm_table->count;
> >>>   clocks->num_levels = data->mclk_latency_table.count = count;
> >>>
> >>> @@ -2696,10 +2688,6 @@ static int vega20_get_dcefclocks(struct
> pp_hwmgr *hwmgr,
> >>>   struct vega20_single_dpm_table *dpm_table = &(data-
> >dpm_table.dcef_table);
> >>>   int i, count;
> >>>
> >>> - PP_ASSERT_WITH_CODE(data-
> >smu_features[GNLD_DPM_DCEFCLK].enabled,
> >>> - "[GetDcfclocks]: dcefclk dpm not enabled!\n",
> >>> - return -EPERM);
> >>> -
> >>>   count = (dpm_table->count > MAX_NUM_CLOCKS) ?
> MAX_NUM_CLOCKS : dpm_table->count;
> >>>   clocks->num_levels = count;
> >>>
> >>> @@ -2719,10 +2707,6 @@ static int vega20_get_socclocks(struct
> pp_hwmgr *hwmgr,
> >>>   struct vega20_single_dpm_table *dpm_table = &(data-
> >dpm_table.soc_table);
> >>>   int i, count;
> >>>
> >>> - PP_ASSERT_WITH_CODE(data-
> >smu_features[GNLD_DPM_SOCCLK].enabled,
> >>> - "[GetSocclks]: socclk dpm not enabled!\n",
> >>> - return -EPERM);
> >>> -
> >>>   count = (dpm_table->count > MAX_NUM_CLOCKS) ?
> MAX_NUM_CLOCKS : dpm_table->count;
> >>>   clocks->num_levels = count;
> >>>
> >> ___
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 000/138] The new SW SMU driver of amdgpu

2019-02-20 Thread Huang, Ray
Ping.
May I have your comments?

Thanks,
Ray

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Huang Rui
> Sent: Friday, January 25, 2019 6:23 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Gao, Likun ; Wang, Kevin(Yang)
> ; Huang, Ray ; Gui, Jack
> 
> Subject: [PATCH 000/138] The new SW SMU driver of amdgpu
> 
> Hi all,
> 
> The series of patches are to implement a new SW SMU driver for future asics.
> 
> Background:
> The powerplay driver will be retired. The final version is for vega20 with
> SMU11. However, the future asic will use the new swSMU framework to
> implement as
> well. Here is the first version of new sw smu driver that is basing on vega20.
> 
> Purpose:
> We would like to do re-arch for linux power codes to use a new sw SMU ip
> block
> for future asics. We hope to write a simple and readable framework for Linux.
> 
> Currently, the default path is still with powerplay on vega20. So far we don’t
> plan to switch default path to new swSMU design for vega20. And we can
> use the
> module parameter amdgpu_dpm to switch it to new SW SMU design
> (modprobe amdgpu
> dpm=1).
> 
> Development Items:
> - Setup new SMU IP block skeleton.
> - Implement SMC firmware loading function.
> - Implement SMC table data structure.
> - Implement SMU v11 indirect register (MP1) read/write and SMC message
> sending
>   helpers.
> - Implement SMU v11 SMC table initialization (read from vbios, parse,
> populate,
>   and write back to smc).
> - Implement SMU v11 memory pool location function.
> - Enable DPM feature for SMU v11 and implement DPM control callback
> function.
> - Enable interfaces for starting tools.
> - Implement SMU v11 power control and power containment functions.
> - Implement and enable SMU v11 thermal/fan control function.
> - Implement SMU interfaces placeholder for DC, VCN, and KFD driver.
> - Enable and align sys interface in the amdgpu_pm.c.
> 
> So far, Kevin, Likun, Jack, and I have enabled features such as dpm, od,
> thermal, and etc. with new sw smu driver. (Thanks to Kevin, Likun and Jack's
> great effort!)
> 
> Any comments are warm for us.
> 
> Thanks & Best Regards!
> Ray
> 
> 
> Chengming Gui (14):
>   drm/amd/powerplay: implement power_dpm_state sys interface for
> SMU11
>   drm/amd/powerplay: add watermarks related data structs and function
> for SMU11.
>   drm/amd/powerplay: implement pp_power_profile_mode sys inerface for
> SMU11
>   drm/amd/powerplay: add display_config to handle display config for
> SMU11.
>   drm/amd/powerplay: add mclk_latency_table struct and smu_clocks struct
> for SMU11
>   drm/amd/powerplay: add enable_umd_pstate functions for SMU11
>   drm/amd/powerplay: add get_profiling_clk_mask functions for SMU11
>   drm/amd/powerplay: add set_uclk_to_highest_level for SMU11
>   drm/amd/powerplay: add display_config_changed for SMU11.
>   drm/amd/powerplay: add apply_clock_adjust_rules for SMU11.
>   drm/amd/powerplay: add vega20_notify_smc_display_config functions for
> SMU11
>   drm/amd/powerplay: add vega20_find/force_higest/lowest_dpm for
> SMU11
>   drm/amd/powerplay: add vega20_unforce_dpm_levels for SMU11.
>   drm/amd/powerplay: implement power_dpm_force_performance_level
> for
> SMU11
> 
> Huang Rui (53):
>   drm/amd/powerplay: add new smu ip block
>   drm/amd/powerplay: add smu11 sub block for SMU IP
>   drm/amd/powerplay: add firmware loading interface
>   drm/amd/powerplay: add fw load checking interface
>   drm/amd/powerplay: add interface to read pptable from vbios
>   drm/amd/powerplay: add placeholder of smu_initialize_pptable
>   drm/amd/powerplay: add interface to init smc tables (v2)
>   drm/amd/powerplay: add interface to init power (v2)
>   drm/amd/powerplay: add interface to get vbios bootup values (v2)
>   drm/amd/powerplay: add interface to check pptable (v2)
>   drm/amd/powerplay: add interface to init fb allocations (v2)
>   drm/amd/powerplay: add interface to parse pptable (v2)
>   drm/amd/powerplay: add interface to populate smc pptable (v2)
>   drm/amd/powerplay: add interface to check fw version (v2)
>   drm/amd/powerplay: add interface to write pptable (v2)
>   drm/amd/powerplay: add interface to set min dcef deep sleep (v2)
>   drm/amd/powerplay: add interface to set tool table location (v2)
>   drm/amd/powerplay: add interface to allocate memory pool (v2)
>   drm/amd/powerplay: add interface to notify memory pool location (v2)
>   drm/amd/powerplay: add interfaces for smu resume
>   drm/amd/powerplay: add resume sequence placeholder for smu ip block
>   drm/amdgpu: enable new smu ip block for vega20
>   drm/amd/powerplay: add new ppsmc header for smu11 (v2)
>   drm/amd/powerplay: add pptable header for smu11
>   drm/amdgpu: update atomfirmware header for smu11
>   drm/amdgpu: update new members in atomfirmware
>   drm/amd/powerplay: add smu table context structure
>   drm/amd/powerplay: add get atom data table helper
>   drm/amdgpu: move 

[PATCH] drm/amdgpu: fix dma mask check in gmc_v6_0.c

2019-02-20 Thread Christian König
This got messed up by "drm: change func to better detect wether swiotlb
is needed".

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 9fc3296592fe..98fd9208877f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -886,7 +886,7 @@ static int gmc_v6_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
}
-   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+   adev->need_swiotlb = drm_need_swiotlb(dma_bits);
 
r = gmc_v6_0_init_microcode(adev);
if (r) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Thomas Hellstrom
On Wed, 2019-02-20 at 08:35 +, Koenig, Christian wrote:
> Am 20.02.19 um 09:14 schrieb Thomas Hellstrom:
> > On 2/20/19 9:07 AM, Christian König wrote:
> > > Am 20.02.19 um 07:41 schrieb Thomas Hellstrom:
> > > > On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:
> > > > > On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
> > > > > > On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
> > > > > > > Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
> > > > > > > > On Mon, 2019-02-18 at 09:20 +, Koenig, Christian
> > > > > > > > wrote:
> > > > > > > > > Another good question is also why the heck the
> > > > > > > > > acc_size
> > > > > > > > > counts
> > > > > > > > > towards
> > > > > > > > > the DMA32 zone?
> > > > > > > > DMA32 TTM pages are accounted in the DMA32 zone. Other
> > > > > > > > pages
> > > > > > > > are
> > > > > > > > not.
> > > > > > > Yeah, I'm perfectly aware of this. But this is for the
> > > > > > > accounting
> > > > > > > size!
> > > > > > > 
> > > > > > > We have an accounting for the stuff needed additional to
> > > > > > > the
> > > > > > > pages
> > > > > > > backing the BO (e.g. the page and DMA addr array).
> > > > > > > 
> > > > > > > And from the bug description it sounds like we use the
> > > > > > > DMA32 zone
> > > > > > > for
> > > > > > > this accounting which of course is completely nonsense.
> > > > > > It's actually accounted in all available zones, since it
> > > > > > would be
> > > > > > pretty hard to determine exactly where that memory should
> > > > > > be
> > > > > > accounted.
> > > > > > In particular if it's vmalloced. It might be DMA32, it
> > > > > > might not.
> > > > > > Given
> > > > > > the objective of stopping malicious user-space from
> > > > > > exhausting the
> > > > > > DMA32 zone it was, at the time the code was written, a
> > > > > > reasonable
> > > > > > approximation. With ever increasing memory sizes, there
> > > > > > might be
> > > > > > better
> > > > > > solutions?
> > > > > As far as I can see, in TTM, ttm_mem_global_alloc is only
> > > > > used for
> > > > > the
> > > > > acc_size in ttm_bo_init_reserved. Other than that vmwgfx also
> > > > > seems
> > > > > to
> > > > > use it to account for a few things that are allocated with
> > > > > kmalloc.
> > > > > 
> > > > > So would a better solution be to change ttm_mem_global_alloc
> > > > > to use
> > > > > only
> > > > > the kernel zone?
> > > > > 
> > > > IMO we need to determine what functionality to keep and then
> > > > the best
> > > > solution. The current code does its job, but is obviously too
> > > > restrictive. Both of the solutions you suggest open up for
> > > > potential
> > > > DOS attacks (DMA32 and kernel zones are not mutually exclusive.
> > > > They
> > > > overlap).
> > > 
> > > Yeah and exactly because of this it doesn't make to much sense
> > > to 
> > > account the housekeeping stuff in both zones.
> > 
> > IMO it makes sense because (given the assumption that
> > kmalloc/vmalloc 
> > doesn't care) , accounting only in DMA32 in low memory
> > configurations 
> > will not guarantee that kernel is not exhausted. Accounting only
> > in 
> > kernel in huge memory configurations will not guarantee that DMA32
> > is 
> > not exhausted.
> 
> DMA32 is not exhausted by TTMs kmalloc/vmalloc because otherwise any 
> other kmalloc/vmalloc could exhaust it as well.
> 
> I mean its probably 20+ years ago that I last looked at stuff in
> this 
> part of the kernel, but I don't think we have ever changed the
> handling 
> of DMA/DMA32 or otherwise we would be running into massive
> exhaustion 
> problems already on modern systems.
> 
> > > IIRC the kernel takes perfectly care by itself that kmalloced
> > > memory 
> > > doesn't eat up everything in the DMA32 zone.
> > 
> > If we can somehow verify this, or at least illustrate that it's 
> > likely, I'm perfectly fine with either patch from the vmwgfx POW.
> 
> Well I actually think that the whole accounting of housekeeping
> towards 
> the memory zones is completely pointless in the first place. Even if
> we 
> swap things out we still have the same BO and TTM structures around
> anyway.

Well, the whole point of this is to stop malicious user-space apps from
exhausting memory, causing a DOS vulnerability, similaraly for example
to how processes are stopped from creating to many open file
descriptors or other kernel structures. IIRC, for example, you used to
be able to call gem open in an infinite loop with the same bo until you
brought a random big process down using the OOM killer. With vmwgfx you
can't do that. You'd end up swapping out all available BOs and then
you'd get an error.

> 
> What we should rather to is to make sure that all processes using a
> BO 
> have their RSS increased because they do so. So when a process tries
> to 
> exhaust a memory domain it is simply killed by the OOM killer.

Yes, if there is a way to reliably do that and also take care of other
kernel structures that are created by 

Re: [PATCH 1/6] drm/amdgpu: add missing license on baco files

2019-02-20 Thread Zhang, Hawking
Series is Reviewed-by: Hawking Zhang 

Regards,
Hawking 

Sent from my iPhone

> On Feb 20, 2019, at 05:00, Alex Deucher  wrote:
> 
>> On Fri, Feb 15, 2019 at 5:47 PM Alex Deucher  wrote:
>> 
>> Trivial.
>> 
>> Signed-off-by: Alex Deucher 
> 
> Ping on this series.  The first 3 patches are fixes.
> 
> Alex
> 
>> ---
>> .../gpu/drm/amd/powerplay/hwmgr/vega10_baco.c | 22 +++
>> .../gpu/drm/amd/powerplay/hwmgr/vega20_baco.c | 22 +++
>> 2 files changed, 44 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c 
>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
>> index f94dab27f486..d5232110ec84 100644
>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_baco.c
>> @@ -1,3 +1,25 @@
>> +/*
>> + * Copyright 2018 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included 
>> in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> #include "amdgpu.h"
>> #include "soc15.h"
>> #include "soc15_hw_ip.h"
>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c 
>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
>> index 0d883b358df2..edf00da8424b 100644
>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_baco.c
>> @@ -1,3 +1,25 @@
>> +/*
>> + * Copyright 2018 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included 
>> in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> #include "amdgpu.h"
>> #include "soc15.h"
>> #include "soc15_hw_ip.h"
>> --
>> 2.20.1
>> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Koenig, Christian
Am 20.02.19 um 09:14 schrieb Thomas Hellstrom:
> On 2/20/19 9:07 AM, Christian König wrote:
>> Am 20.02.19 um 07:41 schrieb Thomas Hellstrom:
>>> On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:
 On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
> On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
>> Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
>>> On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:
 Another good question is also why the heck the acc_size
 counts
 towards
 the DMA32 zone?
>>> DMA32 TTM pages are accounted in the DMA32 zone. Other pages
>>> are
>>> not.
>> Yeah, I'm perfectly aware of this. But this is for the accounting
>> size!
>>
>> We have an accounting for the stuff needed additional to the
>> pages
>> backing the BO (e.g. the page and DMA addr array).
>>
>> And from the bug description it sounds like we use the DMA32 zone
>> for
>> this accounting which of course is completely nonsense.
> It's actually accounted in all available zones, since it would be
> pretty hard to determine exactly where that memory should be
> accounted.
> In particular if it's vmalloced. It might be DMA32, it might not.
> Given
> the objective of stopping malicious user-space from exhausting the
> DMA32 zone it was, at the time the code was written, a reasonable
> approximation. With ever increasing memory sizes, there might be
> better
> solutions?
 As far as I can see, in TTM, ttm_mem_global_alloc is only used for
 the
 acc_size in ttm_bo_init_reserved. Other than that vmwgfx also seems
 to
 use it to account for a few things that are allocated with kmalloc.

 So would a better solution be to change ttm_mem_global_alloc to use
 only
 the kernel zone?

>>> IMO we need to determine what functionality to keep and then the best
>>> solution. The current code does its job, but is obviously too
>>> restrictive. Both of the solutions you suggest open up for potential
>>> DOS attacks (DMA32 and kernel zones are not mutually exclusive. They
>>> overlap).
>>
>> Yeah and exactly because of this it doesn't make to much sense to 
>> account the housekeeping stuff in both zones.
>
> IMO it makes sense because (given the assumption that kmalloc/vmalloc 
> doesn't care) , accounting only in DMA32 in low memory configurations 
> will not guarantee that kernel is not exhausted. Accounting only in 
> kernel in huge memory configurations will not guarantee that DMA32 is 
> not exhausted.

DMA32 is not exhausted by TTMs kmalloc/vmalloc because otherwise any 
other kmalloc/vmalloc could exhaust it as well.

I mean its probably 20+ years ago that I last looked at stuff in this 
part of the kernel, but I don't think we have ever changed the handling 
of DMA/DMA32 or otherwise we would be running into massive exhaustion 
problems already on modern systems.

>>
>> IIRC the kernel takes perfectly care by itself that kmalloced memory 
>> doesn't eat up everything in the DMA32 zone.
>
> If we can somehow verify this, or at least illustrate that it's 
> likely, I'm perfectly fine with either patch from the vmwgfx POW.

Well I actually think that the whole accounting of housekeeping towards 
the memory zones is completely pointless in the first place. Even if we 
swap things out we still have the same BO and TTM structures around anyway.

What we should rather to is to make sure that all processes using a BO 
have their RSS increased because they do so. So when a process tries to 
exhaust a memory domain it is simply killed by the OOM killer.

Regards,
Christian.

>
> Thanks,
> Thomas
>
>
>
>>
>> Christian.
>>
>>>
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
 Regards,
     Felix


> /Thomas
>
>> Christian.
>>
>>> For small persistent allocations using ttm_mem_global_alloc(),
>>> they
>>> are
>>> accounted also in the DMA32 zone, which may cause over-
>>> accounting
>>> of
>>> that zone, but that's pretty unlikely to be a big problem..
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>>
 In other words why should the internal bookkeeping pages be
 allocated
 in
 the DMA32 zone?

 That doesn't sounds valid to me in any way,
 Christian.

 Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> Hmm,
>
> This zone was intended to stop TTM page allocations from
> exhausting
> the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
> default,
> which
> means if we drop this check, other devices may stop
> functioning
> unexpectedly?
>
> However, in the end I'd expect the kernel page allocation
> system
> to
> make sure there are some pages left in the DMA32 zone,
> otherwise
> random 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-20 Thread Koenig, Christian
Am 20.02.19 um 09:10 schrieb zhoucm1:


On 2019年02月20日 15:59, Koenig, Christian wrote:
Am 20.02.19 um 05:53 schrieb zhoucm1:


On 2019年02月19日 19:32, Koenig, Christian wrote:
Hi David,

Could you have a look if it's reasonable?

Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we already 
submitted work to the hardware. And just dropping the fence like you do in the 
patch is a clearly no-go as well.

Then do you have any idea to skip the messed order signal point?

No, I don't think we can actually do this.
But as Lionel pointed out, user mode shouldn't query a smaller timeline payload 
compared to last time, we must skip messed order signal point!

No we don't.

That userspace queries a smaller timeline payload compared to last time is 
because userspace messed up signaling order in the first place.

Additional to that I'm not sure that userspace would query a smaller timeline 
payload. IIRC our workaround for messed up signaling order handled that case 
gracefully as well.

Christian.


-David


The only solution I can see would be to lock down the syncobj to modifications 
while command submission is in progress. And that in turn would mean a huge 
bunch of ww_mutex overhead we will certainly want to avoid.

Christian.


-David

Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:

Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is already rebased 
to latest drm-misc(kernel 5.0). You can directly use that branch.


-David

On 2019年02月19日 01:01, Koenig, Christian wrote:
Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:
Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj implementation?

David is the expert on that, but as far as I know that is forbidden by the 
vulkan spec.

I'm not finding anything in the vulkan spec that makes out of order signaling 
illegal.
That's why I came up with this test, just verifying that the timeline does not 
go backward in term of its payload.

Well we need to handle this case gracefully in the kernel, so it is still a 
good testcase.

Christian.


-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:
Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. either the 
same way as during CS or we abort and return an error message.

I think just using the same approach as during CS ist the best we can do.

Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, and we 
shouldn't lead to deadlock.

Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a warning?

Otherwise like Lionel's unexpected use cases, which easily leads to deadlock.


-David

On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: 

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Thomas Hellstrom

On 2/20/19 9:07 AM, Christian König wrote:

Am 20.02.19 um 07:41 schrieb Thomas Hellstrom:

On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:

On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:

On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:

Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:

On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:

Another good question is also why the heck the acc_size
counts
towards
the DMA32 zone?

DMA32 TTM pages are accounted in the DMA32 zone. Other pages
are
not.

Yeah, I'm perfectly aware of this. But this is for the accounting
size!

We have an accounting for the stuff needed additional to the
pages
backing the BO (e.g. the page and DMA addr array).

And from the bug description it sounds like we use the DMA32 zone
for
this accounting which of course is completely nonsense.

It's actually accounted in all available zones, since it would be
pretty hard to determine exactly where that memory should be
accounted.
In particular if it's vmalloced. It might be DMA32, it might not.
Given
the objective of stopping malicious user-space from exhausting the
DMA32 zone it was, at the time the code was written, a reasonable
approximation. With ever increasing memory sizes, there might be
better
solutions?

As far as I can see, in TTM, ttm_mem_global_alloc is only used for
the
acc_size in ttm_bo_init_reserved. Other than that vmwgfx also seems
to
use it to account for a few things that are allocated with kmalloc.

So would a better solution be to change ttm_mem_global_alloc to use
only
the kernel zone?


IMO we need to determine what functionality to keep and then the best
solution. The current code does its job, but is obviously too
restrictive. Both of the solutions you suggest open up for potential
DOS attacks (DMA32 and kernel zones are not mutually exclusive. They
overlap).


Yeah and exactly because of this it doesn't make to much sense to 
account the housekeeping stuff in both zones.


IMO it makes sense because (given the assumption that kmalloc/vmalloc 
doesn't care) , accounting only in DMA32 in low memory configurations 
will not guarantee that kernel is not exhausted. Accounting only in 
kernel in huge memory configurations will not guarantee that DMA32 is 
not exhausted.




IIRC the kernel takes perfectly care by itself that kmalloced memory 
doesn't eat up everything in the DMA32 zone.


If we can somehow verify this, or at least illustrate that it's likely, 
I'm perfectly fine with either patch from the vmwgfx POW.


Thanks,
Thomas





Christian.




/Thomas





Regards,
    Felix



/Thomas


Christian.


For small persistent allocations using ttm_mem_global_alloc(),
they
are
accounted also in the DMA32 zone, which may cause over-
accounting
of
that zone, but that's pretty unlikely to be a big problem..

/Thomas






In other words why should the internal bookkeeping pages be
allocated
in
the DMA32 zone?

That doesn't sounds valid to me in any way,
Christian.

Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:

Hmm,

This zone was intended to stop TTM page allocations from
exhausting
the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
default,
which
means if we drop this check, other devices may stop
functioning
unexpectedly?

However, in the end I'd expect the kernel page allocation
system
to
make sure there are some pages left in the DMA32 zone,
otherwise
random non-IO page allocations would also potentially
exhaust
the
DMA32 zone without anybody caring, which means removing
this
zone
wouldn't be any worse than whatever other subsystems may be
doing
already...

/Thomas


On 2/16/19 12:02 AM, Kuehling, Felix wrote:

This is an RFC. I'm not sure this is the right solution,
but
it
highlights the problem I'm trying to solve.

The dma32_zone limits the acc_size of all allocated BOs
to
2GB.
On a
64-bit system with hundreds of GB of system memory and
GPU
memory,
this can become a bottle neck. We're seeing TTM memory
allocation
failures not because we're truly out of memory, but
because
we're
out of space in the dma32_zone for the acc_size needed
for
our BO
book-keeping.

Signed-off-by: Felix Kuehling 
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
 drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
b/drivers/gpu/drm/ttm/ttm_memory.c
index f1567c3..bb05365 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -363,7 +363,7 @@ static int
ttm_mem_init_highmem_zone(struct
ttm_mem_global *glob,
 glob->zones[glob->num_zones++] = zone;
 return 0;
 }
-#else
+#elifndef CONFIG_64BIT
 static int ttm_mem_init_dma32_zone(struct
ttm_mem_global
*glob,
    const struct sysinfo *si)
 {
@@ -441,7 +441,7 @@ int ttm_mem_global_init(struct
ttm_mem_global
*glob)
 ret = ttm_mem_init_highmem_zone(glob, );
 if (unlikely(ret != 0))
 goto out_no_zone;

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-20 Thread zhoucm1



On 2019年02月20日 15:59, Koenig, Christian wrote:

Am 20.02.19 um 05:53 schrieb zhoucm1:




On 2019年02月19日 19:32, Koenig, Christian wrote:

Hi David,


Could you have a look if it's reasonable?


Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we 
already submitted work to the hardware. And just dropping the fence 
like you do in the patch is a clearly no-go as well.


Then do you have any idea to skip the messed order signal point?


No, I don't think we can actually do this.
But as Lionel pointed out, user mode shouldn't query a smaller timeline 
payload compared to last time, we must skip messed order signal point!


-David



The only solution I can see would be to lock down the syncobj to 
modifications while command submission is in progress. And that in 
turn would mean a huge bunch of ww_mutex overhead we will certainly 
want to avoid.


Christian.



-David


Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:


Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is 
already rebased to latest drm-misc(kernel 5.0). You can directly 
use that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is 
forbidden by the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of 
order signaling illegal.
That's why I came up with this test, just verifying that the 
timeline does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it 
is still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on 
purpose :)


Anyway we really need to handle invalid order graceful here. 
E.g. either the same way as during CS or we abort and return an 
error message.


I think just using the same approach as during CS ist the best 
we can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just 
give a warning?


Otherwise like Lionel's unexpected use cases, which easily leads 
to deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 

Re: [PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

2019-02-20 Thread Christian König

Am 20.02.19 um 07:41 schrieb Thomas Hellstrom:

On Tue, 2019-02-19 at 17:06 +, Kuehling, Felix wrote:

On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:

On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:

Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:

On Mon, 2019-02-18 at 09:20 +, Koenig, Christian wrote:

Another good question is also why the heck the acc_size
counts
towards
the DMA32 zone?

DMA32 TTM pages are accounted in the DMA32 zone. Other pages
are
not.

Yeah, I'm perfectly aware of this. But this is for the accounting
size!

We have an accounting for the stuff needed additional to the
pages
backing the BO (e.g. the page and DMA addr array).

And from the bug description it sounds like we use the DMA32 zone
for
this accounting which of course is completely nonsense.

It's actually accounted in all available zones, since it would be
pretty hard to determine exactly where that memory should be
accounted.
In particular if it's vmalloced. It might be DMA32, it might not.
Given
the objective of stopping malicious user-space from exhausting the
DMA32 zone it was, at the time the code was written, a reasonable
approximation. With ever increasing memory sizes, there might be
better
solutions?

As far as I can see, in TTM, ttm_mem_global_alloc is only used for
the
acc_size in ttm_bo_init_reserved. Other than that vmwgfx also seems
to
use it to account for a few things that are allocated with kmalloc.

So would a better solution be to change ttm_mem_global_alloc to use
only
the kernel zone?


IMO we need to determine what functionality to keep and then the best
solution. The current code does its job, but is obviously too
restrictive. Both of the solutions you suggest open up for potential
DOS attacks (DMA32 and kernel zones are not mutually exclusive. They
overlap).


Yeah and exactly because of this it doesn't make to much sense to 
account the housekeeping stuff in both zones.


IIRC the kernel takes perfectly care by itself that kmalloced memory 
doesn't eat up everything in the DMA32 zone.


Christian.




/Thomas





Regards,
Felix



/Thomas


Christian.


For small persistent allocations using ttm_mem_global_alloc(),
they
are
accounted also in the DMA32 zone, which may cause over-
accounting
of
that zone, but that's pretty unlikely to be a big problem..

/Thomas






In other words why should the internal bookkeeping pages be
allocated
in
the DMA32 zone?

That doesn't sounds valid to me in any way,
Christian.

Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:

Hmm,

This zone was intended to stop TTM page allocations from
exhausting
the DMA32 zone. IIRC dma_alloc_coherent() uses DMA32 by
default,
which
means if we drop this check, other devices may stop
functioning
unexpectedly?

However, in the end I'd expect the kernel page allocation
system
to
make sure there are some pages left in the DMA32 zone,
otherwise
random non-IO page allocations would also potentially
exhaust
the
DMA32 zone without anybody caring, which means removing
this
zone
wouldn't be any worse than whatever other subsystems may be
doing
already...

/Thomas


On 2/16/19 12:02 AM, Kuehling, Felix wrote:

This is an RFC. I'm not sure this is the right solution,
but
it
highlights the problem I'm trying to solve.

The dma32_zone limits the acc_size of all allocated BOs
to
2GB.
On a
64-bit system with hundreds of GB of system memory and
GPU
memory,
this can become a bottle neck. We're seeing TTM memory
allocation
failures not because we're truly out of memory, but
because
we're
out of space in the dma32_zone for the acc_size needed
for
our BO
book-keeping.

Signed-off-by: Felix Kuehling 
CC: thellst...@vmware.com
CC: christian.koe...@amd.com
---
 drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
b/drivers/gpu/drm/ttm/ttm_memory.c
index f1567c3..bb05365 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -363,7 +363,7 @@ static int
ttm_mem_init_highmem_zone(struct
ttm_mem_global *glob,
 glob->zones[glob->num_zones++] = zone;
 return 0;
 }
-#else
+#elifndef CONFIG_64BIT
 static int ttm_mem_init_dma32_zone(struct
ttm_mem_global
*glob,
const struct sysinfo *si)
 {
@@ -441,7 +441,7 @@ int ttm_mem_global_init(struct
ttm_mem_global
*glob)
 ret = ttm_mem_init_highmem_zone(glob, );
 if (unlikely(ret != 0))
 goto out_no_zone;
-#else
+#elifndef CONFIG_64BIT
 ret = ttm_mem_init_dma32_zone(glob, );
 if (unlikely(ret != 0))
 goto out_no_zone;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

[PATCH -next] drm/amdgpu: remove set but not used variables 'vm, bo'

2019-02-20 Thread YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function 
'update_gpuvm_pte':
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:840:20: warning:
 variable 'bo' set but not used [-Wunused-but-set-variable]

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:838:20: warning:
 variable 'vm' set but not used [-Wunused-but-set-variable]

They're never used since introduction in a46a2cd103a8 ("drm/amdgpu: Add GPUVM
memory management functions for KFD")

Signed-off-by: YueHaibing 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d7b10d79f1de..63daf758cd03 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -837,13 +837,7 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
struct amdgpu_sync *sync)
 {
int ret;
-   struct amdgpu_vm *vm;
-   struct amdgpu_bo_va *bo_va;
-   struct amdgpu_bo *bo;
-
-   bo_va = entry->bo_va;
-   vm = bo_va->base.vm;
-   bo = bo_va->base.bo;
+   struct amdgpu_bo_va *bo_va = entry->bo_va;
 
/* Update the page tables  */
ret = amdgpu_vm_bo_update(adev, bo_va, false);





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx