Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()

2019-02-06 Thread Daniel Vetter
On Wed, Feb 06, 2019 at 05:46:51PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 06.02.2019 16.26, skrev Daniel Vetter:
> > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 05.02.2019 17.31, skrev Daniel Vetter:
> >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
> 
> 
>  Den 05.02.2019 10.11, skrev Daniel Vetter:
> > On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 04.02.2019 16.41, skrev Daniel Vetter:
> >>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
>  The only thing now that makes drm_dev_unplug() special is that it 
>  sets
>  drm_device->unplugged. Move this code to drm_dev_unregister() so 
>  that we
>  can remove drm_dev_unplug().
> 
>  Signed-off-by: Noralf Trønnes 
>  ---
> >>
> >> [...]
> >>
>   drivers/gpu/drm/drm_drv.c | 27 +++
>   include/drm/drm_drv.h | 10 --
>   2 files changed, 19 insertions(+), 18 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>  index 05bbc2b622fc..e0941200edc6 100644
>  --- a/drivers/gpu/drm/drm_drv.c
>  +++ b/drivers/gpu/drm/drm_drv.c
>  @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
>    */
>   void drm_dev_unplug(struct drm_device *dev)
>   {
>  -/*
>  - * After synchronizing any critical read section is guaranteed 
>  to see
>  - * the new value of ->unplugged, and any critical section which 
>  might
>  - * still have seen the old value of ->unplugged is guaranteed 
>  to have
>  - * finished.
>  - */
>  -dev->unplugged = true;
>  -synchronize_srcu(_unplug_srcu);
>  -
>   drm_dev_unregister(dev);
>   drm_dev_put(dev);
>   }
>  @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
>    * drm_dev_register() but does not deallocate the device. The 
>  caller must call
>    * drm_dev_put() to drop their final reference.
>    *
>  - * A special form of unregistering for hotpluggable devices is 
>  drm_dev_unplug(),
>  - * which can be called while there are still open users of @dev.
>  + * This function can be called while there are still open users of 
>  @dev as long
>  + * as the driver protects its device resources using 
>  drm_dev_enter() and
>  + * drm_dev_exit().
>    *
>    * This should be called first in the device teardown code to make 
>  sure
>  - * userspace can't access the device instance any more.
>  + * userspace can't access the device instance any more. Drivers 
>  that support
>  + * device unplug will probably want to call 
>  drm_atomic_helper_shutdown() first
> >>>
> >>> Read once more with a bit more coffee, spotted this:
> >>>
> >>> s/first/afterwards/ - shutting down the hw before we've taken it away 
> >>> from
> >>> userspace is kinda the wrong way round. It should be the inverse of 
> >>> driver
> >>> load, which is 1) allocate structures 2) prep hw 3) register driver 
> >>> with
> >>> the world (simplified ofc).
> >>>
> >>
> >> The problem is that drm_dev_unregister() sets the device as unplugged
> >> and if drm_atomic_helper_shutdown() is called afterwards it's not
> >> allowed to touch hardware.
> >>
> >> I know it's the wrong order, but the only way to do it in the right
> >> order is to have a separate function that sets unplugged:
> >>
> >>drm_dev_unregister();
> >>drm_atomic_helper_shutdown();
> >>drm_dev_set_unplugged();
> >
> > Annoying ... but yeah calling _shutdown() before we stopped userspace is
> > also not going to work. Because userspace could quickly re-enable
> > something, and then the refcounts would be all wrong again and leaking
> > objects.
> >
> 
>  What happens with a USB device that is unplugged with open userspace,
>  will that leak objects?
> >>>
> >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
> >>> as normal, the only thing that should be skipped is actually touching the
> >>> hw (as long as the driver doesn't protect too much with
> >>> drm_dev_enter/exit). So all the software updates (including refcounting
> >>> updates) will still be done. Ofc current udl is not yet atomic, so in
> >>> reality something else happens.
> >>>
> >>> And we ofc still have the same issue that if you just unload the driver,
> >>> then the hw will stay on (which might really confuse the driver on next
> >>> load, when it assumes that it only gets loaded 

Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Zhao, Yong
I see. I will drop this. How about the other two patches?

Regards,
Yong

From: Christian K?nig 
Sent: Wednesday, February 6, 2019 1:20 PM
To: Alex Deucher; Zhao, Yong
Cc: Kuehling, Felix; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Am 06.02.19 um 19:17 schrieb Alex Deucher:
> On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong  wrote:
>> Not too sure about this, but it looks strange in the code when all other 
>> similar code uses 64-bit.
> It's worth double checking, but Felix is right.  A number of IPs still
> used 32 bit doorbells on vega10.  E.g., multi-media for example.

I agree, the IH is certainly a 32bit doorbell on Vega10.

Christian.

>
> Alex
>
>> "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell 
>> indexes". You mean the alternative is to multiply all the current 64 
>> doorbell index constants with 2, right? That might be easier and cleaner, 
>> and we need to make sure that the *2 and << 1 conversions from 64-bit index 
>> to dword index are all removed.
>>
>> Regards,
>> Yong
>> 
>> From: Kuehling, Felix
>> Sent: Wednesday, February 6, 2019 11:26 AM
>> To: Zhao, Yong; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
>>
>> Are you sure about this? Typically 64-bit doorbells don't wrap around. But 
>> this one does. If the IH doorbell wraps around, there is no reason why it 
>> needs to be 64-bit, so I suspect it may still be a 32-bit doorbell.
>>
>> AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. 
>> Therefore, maybe some of your other renaming changes should be reconsidered 
>> if they assume that all doorbells are 64-bit. Maybe it makes more sense to 
>> think of 64-bit doorbells as using 2 doorbell indexes.
>>
>> Regards,
>>Felix
>>
>> 
>> From: amd-gfx  on behalf of Zhao, 
>> Yong 
>> Sent: Wednesday, February 6, 2019 10:49 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Zhao, Yong
>> Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
>>
>> Clearly, it should be a 64-bit doorbell operation.
>>
>> Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
>> Signed-off-by: Yong Zhao 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>> index 796004896661..36f0e3cada30 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>> @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device 
>> *adev,
>>  if (ih->use_doorbell) {
>>  /* XXX check if swapping is necessary on BE */
>>  *ih->rptr_cpu = ih->rptr;
>> -   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
>> +   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
>>  } else if (ih == >irq.ih) {
>>  WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
>>  } else if (ih == >irq.ih1) {
>> --
>> 2.17.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15

2019-02-06 Thread Knott, William
Hey Oak,

Interesting you should ask since this has been discussed a lot here recently.  
So I didn't know until recently that the nbif routes doorbells only using 11:2. 
 So in each 4k page there is an area for the various doorbell recipients 
including cp gfx and compute.  

The doorbell upper/lower in the CP only control the setting of a doorbell 
updated bit and currently uses the full 27:2.  The CP also uses the full 27:2 
to match the doorbell to a queue slot and initiate work.

So for your example if you set the ranges to 4k then the CP would not consider 
a 6k address to match the MEC range and would not actually trigger this 
doorbell updated logic.  This logic is only needed for waking up an idle 
scheduler and for power gating support.

We are changing the hardware in the doorbell range check to only look at 11:2 
to match the nbif routing logic.  With the current hardware I am not even sure 
how you configure the range registers if you are using PASID in bits 27:12 of 
the doorbell.  The default for those registers says that compute gets 
everything from gfx (0x12) up to 3f.

How do you actually configure these currently, do you leave these defaults?   
That seems like the only way to have it pseudo work without the coming hardware 
change. 

Thanks
Clint


-Original Message-
From: Zeng, Oak 
Sent: Wednesday, February 06, 2019 10:23 AM
To: Zhao, Yong ; amd-gfx@lists.freedesktop.org; Knott, 
William ; Yang, Alice (SRDC 3D) 
Cc: Zhao, Yong 
Subject: RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting 
CP_MEC_DOORBELL_RANGE_UPPER on SOC15

+ Clint/Alice

Hi Clint,

We think CP_MEC_DOORBELL_RANGE_LOWER/UPPER registers are used by MEC to check 
whether a doorbell routed to MEC belongs to MEC, is this understanding correct? 

From the register spec, those registers are 27 bits. Does this mean MEC use all 
27 bits to determine? For example, if we set lower/upper to [0, 4k], will a 
doorbell ring at 6K address be ignored by MEC?

Thanks,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Zhao, Yong
Sent: Tuesday, February 5, 2019 3:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong 
Subject: [PATCH 2/5] drm/amdgpu: Fix a bug in setting 
CP_MEC_DOORBELL_RANGE_UPPER on SOC15

Because CP can use all doorbells outside the ones reserved for SDMA, IH, and 
VCN, so the value set to CP_MEC_DOORBELL_RANGE_UPPER should be the maximal 
index possible in a page.

Change-Id: I402a56ce9a80e6c2ed2f96be431ae71ca88e73a4
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 3 +++  
drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 3 +++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 5c8d04c353d0..90eca63605ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -73,6 +73,7 @@ struct amdgpu_doorbell_index {
} uvd_vce;
};
uint32_t max_assignment;
+   uint32_t last_idx;
/* Per engine SDMA doorbell size in dword */
uint32_t dw_range_per_sdma_eng;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 262ee3cf6f1c..0278e3ab6b94 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2998,7 +2998,7 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring 
*ring)
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
-   (adev->doorbell_index.userqueue_end * 
2) << 2);
+   (adev->doorbell_index.last_idx * 2) << 2);
}
 
WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, diff --git 
a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
index d2409df2dde9..9eb8c9209231 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
@@ -88,5 +88,8 @@ void vega10_doorbell_index_init(struct amdgpu_device *adev)
(adev->doorbell_index.sdma_engine[1]
- adev->doorbell_index.sdma_engine[0])
* adev->doorbell_index.entry_dw_size;
+
+   adev->doorbell_index.last_idx = PAGE_SIZE
+   / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
index b28c5999d8f0..aa8c7699c689 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
@@ -91,5 +91,8 @@ void vega20_doorbell_index_init(struct amdgpu_device *adev)

Re: [PATCH] drm: enable uncached DMA optimization for ARM and arm64

2019-02-06 Thread Christian König

Am 06.02.19 um 18:23 schrieb Ard Biesheuvel:

On Fri, 25 Jan 2019 at 11:35, Ard Biesheuvel  wrote:

On Fri, 25 Jan 2019 at 12:30, Christian König
 wrote:

Am 25.01.19 um 09:43 schrieb Ard Biesheuvel:

On Thu, 24 Jan 2019 at 15:01, Alex Deucher  wrote:

On Thu, Jan 24, 2019 at 9:00 AM Ard Biesheuvel
 wrote:

On Thu, 24 Jan 2019 at 13:31, Koenig, Christian
 wrote:

Am 24.01.19 um 13:06 schrieb Ard Biesheuvel:

The DRM driver stack is designed to work with cache coherent devices
only, but permits an optimization to be enabled in some cases, where
for some buffers, both the CPU and the GPU use uncached mappings,
removing the need for DMA snooping and allocation in the CPU caches.

The use of uncached GPU mappings relies on the correct implementation
of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU
will use cached mappings nonetheless. On x86 platforms, this does not
seem to matter, as uncached CPU mappings will snoop the caches in any
case. However, on ARM and arm64, enabling this optimization on a
platform where NoSnoop is ignored results in loss of coherency, which
breaks correct operation of the device. Since we have no way of
detecting whether NoSnoop works or not, just disable this
optimization entirely for ARM and arm64.

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: David Zhou 
Cc: Huang Rui 
Cc: Junwei Zhang 
Cc: Michel Daenzer 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Sean Paul 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Will Deacon 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: amd-gfx list 
Cc: dri-devel 
Reported-by: Carsten Haitzler 
Signed-off-by: Ard Biesheuvel 

The subject line should probably read "disable uncached...".


Ugh, of course ...


With that fixed the patch is Reviewed-by: Christian König
.


Same:
Reviewed-by: Alex Deucher 


Thanks all

Should I resend the patch with the subject corrected?

I will update the subject line and push it upstream through
drm-misc-next if nobody objects.


Wonderful, thanks.

Hi Christian,

Are you still planning to merge this for v5.1?


My bad, only pushed this to our internal branch, but forgot out 
drm-misc-next.


Fixed now, thanks for the reminder.

Christian.



Thanks,
Ard.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Christian König

Am 06.02.19 um 19:17 schrieb Alex Deucher:

On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong  wrote:

Not too sure about this, but it looks strange in the code when all other 
similar code uses 64-bit.

It's worth double checking, but Felix is right.  A number of IPs still
used 32 bit doorbells on vega10.  E.g., multi-media for example.


I agree, the IH is certainly a 32bit doorbell on Vega10.

Christian.



Alex


"Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell indexes". 
You mean the alternative is to multiply all the current 64 doorbell index constants with 2, 
right? That might be easier and cleaner, and we need to make sure that the *2 and << 1 
conversions from 64-bit index to dword index are all removed.

Regards,
Yong

From: Kuehling, Felix
Sent: Wednesday, February 6, 2019 11:26 AM
To: Zhao, Yong; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Are you sure about this? Typically 64-bit doorbells don't wrap around. But this 
one does. If the IH doorbell wraps around, there is no reason why it needs to 
be 64-bit, so I suspect it may still be a 32-bit doorbell.

AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. 
Therefore, maybe some of your other renaming changes should be reconsidered if 
they assume that all doorbells are 64-bit. Maybe it makes more sense to think 
of 64-bit doorbells as using 2 doorbell indexes.

Regards,
   Felix


From: amd-gfx  on behalf of Zhao, Yong 

Sent: Wednesday, February 6, 2019 10:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong
Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Clearly, it should be a 64-bit doorbell operation.

Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
Signed-off-by: Yong Zhao 
---
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 796004896661..36f0e3cada30 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
 if (ih->use_doorbell) {
 /* XXX check if swapping is necessary on BE */
 *ih->rptr_cpu = ih->rptr;
-   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
+   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
 } else if (ih == >irq.ih) {
 WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
 } else if (ih == >irq.ih1) {
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Alex Deucher
On Wed, Feb 6, 2019 at 12:01 PM Zhao, Yong  wrote:
>
> Not too sure about this, but it looks strange in the code when all other 
> similar code uses 64-bit.

It's worth double checking, but Felix is right.  A number of IPs still
used 32 bit doorbells on vega10.  E.g., multi-media for example.

Alex

>
> "Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell 
> indexes". You mean the alternative is to multiply all the current 64 doorbell 
> index constants with 2, right? That might be easier and cleaner, and we need 
> to make sure that the *2 and << 1 conversions from 64-bit index to dword 
> index are all removed.
>
> Regards,
> Yong
> 
> From: Kuehling, Felix
> Sent: Wednesday, February 6, 2019 11:26 AM
> To: Zhao, Yong; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
>
> Are you sure about this? Typically 64-bit doorbells don't wrap around. But 
> this one does. If the IH doorbell wraps around, there is no reason why it 
> needs to be 64-bit, so I suspect it may still be a 32-bit doorbell.
>
> AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. 
> Therefore, maybe some of your other renaming changes should be reconsidered 
> if they assume that all doorbells are 64-bit. Maybe it makes more sense to 
> think of 64-bit doorbells as using 2 doorbell indexes.
>
> Regards,
>   Felix
>
> 
> From: amd-gfx  on behalf of Zhao, Yong 
> 
> Sent: Wednesday, February 6, 2019 10:49 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhao, Yong
> Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()
>
> Clearly, it should be a 64-bit doorbell operation.
>
> Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
> Signed-off-by: Yong Zhao 
> ---
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 796004896661..36f0e3cada30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
> if (ih->use_doorbell) {
> /* XXX check if swapping is necessary on BE */
> *ih->rptr_cpu = ih->rptr;
> -   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
> +   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
> } else if (ih == >irq.ih) {
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
> } else if (ih == >irq.ih1) {
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm: enable uncached DMA optimization for ARM and arm64

2019-02-06 Thread Ard Biesheuvel
On Fri, 25 Jan 2019 at 11:35, Ard Biesheuvel  wrote:
>
> On Fri, 25 Jan 2019 at 12:30, Christian König
>  wrote:
> >
> > Am 25.01.19 um 09:43 schrieb Ard Biesheuvel:
> > > On Thu, 24 Jan 2019 at 15:01, Alex Deucher  wrote:
> > >> On Thu, Jan 24, 2019 at 9:00 AM Ard Biesheuvel
> > >>  wrote:
> > >>> On Thu, 24 Jan 2019 at 13:31, Koenig, Christian
> > >>>  wrote:
> >  Am 24.01.19 um 13:06 schrieb Ard Biesheuvel:
> > > The DRM driver stack is designed to work with cache coherent devices
> > > only, but permits an optimization to be enabled in some cases, where
> > > for some buffers, both the CPU and the GPU use uncached mappings,
> > > removing the need for DMA snooping and allocation in the CPU caches.
> > >
> > > The use of uncached GPU mappings relies on the correct implementation
> > > of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU
> > > will use cached mappings nonetheless. On x86 platforms, this does not
> > > seem to matter, as uncached CPU mappings will snoop the caches in any
> > > case. However, on ARM and arm64, enabling this optimization on a
> > > platform where NoSnoop is ignored results in loss of coherency, which
> > > breaks correct operation of the device. Since we have no way of
> > > detecting whether NoSnoop works or not, just disable this
> > > optimization entirely for ARM and arm64.
> > >
> > > Cc: Christian Koenig 
> > > Cc: Alex Deucher 
> > > Cc: David Zhou 
> > > Cc: Huang Rui 
> > > Cc: Junwei Zhang 
> > > Cc: Michel Daenzer 
> > > Cc: David Airlie 
> > > Cc: Daniel Vetter 
> > > Cc: Maarten Lankhorst 
> > > Cc: Maxime Ripard 
> > > Cc: Sean Paul 
> > > Cc: Michael Ellerman 
> > > Cc: Benjamin Herrenschmidt 
> > > Cc: Will Deacon 
> > > Cc: Christoph Hellwig 
> > > Cc: Robin Murphy 
> > > Cc: amd-gfx list 
> > > Cc: dri-devel 
> > > Reported-by: Carsten Haitzler 
> > > Signed-off-by: Ard Biesheuvel 
> >  The subject line should probably read "disable uncached...".
> > 
> > >>> Ugh, of course ...
> > >>>
> >  With that fixed the patch is Reviewed-by: Christian König
> >  .
> > 
> > >> Same:
> > >> Reviewed-by: Alex Deucher 
> > >>
> > > Thanks all
> > >
> > > Should I resend the patch with the subject corrected?
> >
> > I will update the subject line and push it upstream through
> > drm-misc-next if nobody objects.
> >
>
> Wonderful, thanks.

Hi Christian,

Are you still planning to merge this for v5.1?

Thanks,
Ard.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()

2019-02-06 Thread Eric Anholt
Daniel Vetter  writes:
>
> Zooming out more looking at the big picture I'd say all your work in the
> past few years has enormously simplified drm for simple drivers already.
> If we can't resolve this one here right now that just means you "only"
> made drm 98% simpler instead of maybe 99%. It's still an epic win :-)

I'd like to second this.  So many of Noralf's cleanups I think "oof,
that's a lot of work for a little cleanup here".  But we've benefited
immensely from it accumulating over the years.  Thanks again!


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


4.20 regression: short time display corruption during resume

2019-02-06 Thread Rafał Miłecki
Hi,

I use HP EliteBook 745 G5 with Ryzen 5 PRO 2500U. With kernels 4.20
and 5.0.0-rc3 I see a display corruption during resume from RAM. It's
a short time corruption of black screen only, after less than second
an expected image appears, so nothing critical.

It's a regression though so I somehow hope it may be easy to fix. It's
caused by the commit 009d9ed6c4b7 ("drm/amdgpu: Change AI gfx/sdma/smu
init sequence").

I've created a bugzilla report for it:
https://bugs.freedesktop.org/show_bug.cgi?id=109554
(includes corruption photo).

Could someone look at this, please? I'm happy to test any patches.

--
Rafał
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Zhao, Yong
Not too sure about this, but it looks strange in the code when all other 
similar code uses 64-bit.

"Maybe it makes more sense to think of 64-bit doorbells as using 2 doorbell 
indexes". You mean the alternative is to multiply all the current 64 doorbell 
index constants with 2, right? That might be easier and cleaner, and we need to 
make sure that the *2 and << 1 conversions from 64-bit index to dword index are 
all removed.

Regards,
Yong

From: Kuehling, Felix
Sent: Wednesday, February 6, 2019 11:26 AM
To: Zhao, Yong; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Are you sure about this? Typically 64-bit doorbells don't wrap around. But this 
one does. If the IH doorbell wraps around, there is no reason why it needs to 
be 64-bit, so I suspect it may still be a 32-bit doorbell.

AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. 
Therefore, maybe some of your other renaming changes should be reconsidered if 
they assume that all doorbells are 64-bit. Maybe it makes more sense to think 
of 64-bit doorbells as using 2 doorbell indexes.

Regards,
  Felix


From: amd-gfx  on behalf of Zhao, Yong 

Sent: Wednesday, February 6, 2019 10:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong
Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Clearly, it should be a 64-bit doorbell operation.

Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 796004896661..36f0e3cada30 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
if (ih->use_doorbell) {
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
-   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
+   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
} else if (ih == >irq.ih) {
WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
} else if (ih == >irq.ih1) {
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()

2019-02-06 Thread Noralf Trønnes


Den 06.02.2019 16.26, skrev Daniel Vetter:
> On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 05.02.2019 17.31, skrev Daniel Vetter:
>>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:


 Den 05.02.2019 10.11, skrev Daniel Vetter:
> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 04.02.2019 16.41, skrev Daniel Vetter:
>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
 The only thing now that makes drm_dev_unplug() special is that it sets
 drm_device->unplugged. Move this code to drm_dev_unregister() so that 
 we
 can remove drm_dev_unplug().

 Signed-off-by: Noralf Trønnes 
 ---
>>
>> [...]
>>
  drivers/gpu/drm/drm_drv.c | 27 +++
  include/drm/drm_drv.h | 10 --
  2 files changed, 19 insertions(+), 18 deletions(-)

 diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
 index 05bbc2b622fc..e0941200edc6 100644
 --- a/drivers/gpu/drm/drm_drv.c
 +++ b/drivers/gpu/drm/drm_drv.c
 @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
   */
  void drm_dev_unplug(struct drm_device *dev)
  {
 -  /*
 -   * After synchronizing any critical read section is guaranteed 
 to see
 -   * the new value of ->unplugged, and any critical section which 
 might
 -   * still have seen the old value of ->unplugged is guaranteed 
 to have
 -   * finished.
 -   */
 -  dev->unplugged = true;
 -  synchronize_srcu(_unplug_srcu);
 -
drm_dev_unregister(dev);
drm_dev_put(dev);
  }
 @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
   * drm_dev_register() but does not deallocate the device. The caller 
 must call
   * drm_dev_put() to drop their final reference.
   *
 - * A special form of unregistering for hotpluggable devices is 
 drm_dev_unplug(),
 - * which can be called while there are still open users of @dev.
 + * This function can be called while there are still open users of 
 @dev as long
 + * as the driver protects its device resources using drm_dev_enter() 
 and
 + * drm_dev_exit().
   *
   * This should be called first in the device teardown code to make 
 sure
 - * userspace can't access the device instance any more.
 + * userspace can't access the device instance any more. Drivers that 
 support
 + * device unplug will probably want to call 
 drm_atomic_helper_shutdown() first
>>>
>>> Read once more with a bit more coffee, spotted this:
>>>
>>> s/first/afterwards/ - shutting down the hw before we've taken it away 
>>> from
>>> userspace is kinda the wrong way round. It should be the inverse of 
>>> driver
>>> load, which is 1) allocate structures 2) prep hw 3) register driver with
>>> the world (simplified ofc).
>>>
>>
>> The problem is that drm_dev_unregister() sets the device as unplugged
>> and if drm_atomic_helper_shutdown() is called afterwards it's not
>> allowed to touch hardware.
>>
>> I know it's the wrong order, but the only way to do it in the right
>> order is to have a separate function that sets unplugged:
>>
>>  drm_dev_unregister();
>>  drm_atomic_helper_shutdown();
>>  drm_dev_set_unplugged();
>
> Annoying ... but yeah calling _shutdown() before we stopped userspace is
> also not going to work. Because userspace could quickly re-enable
> something, and then the refcounts would be all wrong again and leaking
> objects.
>

 What happens with a USB device that is unplugged with open userspace,
 will that leak objects?
>>>
>>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
>>> as normal, the only thing that should be skipped is actually touching the
>>> hw (as long as the driver doesn't protect too much with
>>> drm_dev_enter/exit). So all the software updates (including refcounting
>>> updates) will still be done. Ofc current udl is not yet atomic, so in
>>> reality something else happens.
>>>
>>> And we ofc still have the same issue that if you just unload the driver,
>>> then the hw will stay on (which might really confuse the driver on next
>>> load, when it assumes that it only gets loaded from cold boot where
>>> everything is off - which usually is the case on an arm soc at least).
>>>
> I get a bit the feeling we're over-optimizing here with trying to devm-ize
> drm_dev_register. Just getting drm_device correctly devm-ized is a big
> step forward already, and will open up a lot of TODO 

[PATCH 2/3] drm/amdkfd: avoid HMM change cause circular lock dependency v2

2019-02-06 Thread Yang, Philip
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock

To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)

Change-Id: I2ec09a47571f6b4c8eaef93f22c0a600f5f70153
Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 32 ++-
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 8372556b52eb..efe0d3c0215b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1156,21 +1156,17 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
int retval;
struct mqd_manager *mqd_mgr;
 
-   retval = 0;
-
-   dqm_lock(dqm);
-
if (dqm->total_queue_count >= max_num_of_queues_per_device) {
pr_warn("Can't create new usermode queue because %d queues were 
already created\n",
dqm->total_queue_count);
retval = -EPERM;
-   goto out_unlock;
+   goto out;
}
 
if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
retval = allocate_sdma_queue(dqm, >sdma_id);
if (retval)
-   goto out_unlock;
+   goto out;
q->properties.sdma_queue_id =
q->sdma_id / get_num_sdma_engines(dqm);
q->properties.sdma_engine_id =
@@ -1181,6 +1177,9 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
if (retval)
goto out_deallocate_sdma_queue;
 
+   /* Do init_mqd before dqm_lock(dqm) to avoid circular locking order:
+* lock(dqm) -> bo::reserve
+*/
mqd_mgr = dqm->ops.get_mqd_manager(dqm,
get_mqd_type_from_queue_type(q->properties.type));
 
@@ -1188,6 +1187,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
+
/*
 * Eviction state logic: we only mark active queues as evicted
 * to avoid the overhead of restoring inactive queues later
@@ -1196,9 +1196,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
q->properties.is_evicted = (q->properties.queue_size > 0 &&
q->properties.queue_percent > 0 &&
q->properties.queue_address != 0);
-
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
-
q->properties.tba_addr = qpd->tba_addr;
q->properties.tma_addr = qpd->tma_addr;
retval = mqd_mgr->init_mqd(mqd_mgr, >mqd, >mqd_mem_obj,
@@ -1206,6 +1204,8 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
if (retval)
goto out_deallocate_doorbell;
 
+   dqm_lock(dqm);
+
list_add(>list, >queues_list);
qpd->queue_count++;
if (q->properties.is_active) {
@@ -1233,9 +1233,7 @@ static int create_queue_cpsch(struct device_queue_manager 
*dqm, struct queue *q,
 out_deallocate_sdma_queue:
if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
deallocate_sdma_queue(dqm, q->sdma_id);
-out_unlock:
-   dqm_unlock(dqm);
-
+out:
return retval;
 }
 
@@ -1398,8 +1396,6 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
qpd->reset_wavefronts = true;
}
 
-   mqd_mgr->uninit_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
-
/*
 * Unconditionally decrement this counter, regardless of the queue's
 * type
@@ -1410,6 +1406,9 @@ static int destroy_queue_cpsch(struct 
device_queue_manager *dqm,
 
dqm_unlock(dqm);
 
+   /* Do uninit_mqd after dqm_unlock(dqm) to avoid circular locking */
+   mqd_mgr->uninit_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
+
return retval;
 
 failed:
@@ -1631,7 +1630,11 @@ static int process_termination_cpsch(struct 
device_queue_manager *dqm,
qpd->reset_wavefronts = false;
}
 
-   /* lastly, free mqd resources */
+   dqm_unlock(dqm);
+
+   /* Lastly, free mqd resources.
+* Do uninit_mqd() after dqm_unlock to avoid circular locking.
+*/
list_for_each_entry_safe(q, next, >queues_list, list) {
mqd_mgr = dqm->ops.get_mqd_manager(dqm,
get_mqd_type_from_queue_type(q->properties.type));
@@ -1645,7 +1648,6 @@ static int process_termination_cpsch(struct 
device_queue_manager *dqm,
}
 
 out:
-   dqm_unlock(dqm);
return 

[PATCH 1/3] drm/amdgpu: use HMM mirror callback to replace mmu notifier v7

2019-02-06 Thread Yang, Philip
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
 drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 154 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
 4 files changed, 70 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e5489069..960a63355705 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
 config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
 
 config DRM_AMDGPU_GART_DEBUGFS
bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 466da5954a68..851001ced5e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -172,7 +172,7 @@ endif
 amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
 amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
 amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM_MIRROR) += amdgpu_mn.o
 
 include $(FULL_AMD_PATH)/powerplay/Makefile
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 3e6823fdd939..e356867d2308 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -45,7 +45,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -58,7 +58,6 @@
  *
  * @adev: amdgpu device pointer
  * @mm: process address space
- * @mn: MMU notifier structure
  * @type: type of MMU notifier
  * @work: destruction work item
  * @node: hash table node to find structure by adev and mn
@@ -66,6 +65,7 @@
  * @objects: interval tree containing amdgpu_mn_nodes
  * @read_lock: mutex for recursive locking of @lock
  * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
  *
  * Data for each amdgpu device and process address space.
  */
@@ -73,7 +73,6 @@ struct amdgpu_mn {
/* constant after initialisation */
struct amdgpu_device*adev;
struct mm_struct*mm;
-   struct mmu_notifier mn;
enum amdgpu_mn_type type;
 
/* only used on destruction */
@@ -85,8 +84,9 @@ struct amdgpu_mn {
/* objects protected by lock */
struct rw_semaphore lock;
struct rb_root_cached   objects;
-   struct mutexread_lock;
-   atomic_trecursion;
+
+   /* HMM mirror */
+   struct hmm_mirror   mirror;
 };
 
 /**
@@ -103,7 +103,7 @@ struct amdgpu_mn_node {
 };
 
 /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
  *
  * @work: previously sheduled work item
  *
@@ -129,28 +129,26 @@ static void amdgpu_mn_destroy(struct work_struct *work)
}
up_write(>lock);
mutex_unlock(>mn_lock);
-   mmu_notifier_unregister_no_release(>mn, amn->mm);
+
+   hmm_mirror_unregister(>mirror);
kfree(amn);
 }
 
 /**
- * amdgpu_mn_release - callback to notify about mm destruction
+ * amdgpu_hmm_mirror_release - callback to notify about mm destruction
  *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
  *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
  */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
 {
-   struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+   struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
 
INIT_WORK(>work, amdgpu_mn_destroy);
schedule_work(>work);
 }
 
-
 /**
  * amdgpu_mn_lock - take the write side lock for this notifier
  *
@@ -181,14 +179,10 @@ void amdgpu_mn_unlock(struct amdgpu_mn *mn)
 static int amdgpu_mn_read_lock(struct amdgpu_mn *amn, bool blockable)
 {
if (blockable)
-   mutex_lock(>read_lock);
-   else if (!mutex_trylock(>read_lock))
+   down_read(>lock);
+   else if 

[PATCH 0/3] Use HMM to replace get_user_pages

2019-02-06 Thread Yang, Philip
Hi Christian,

Resend patch 1/3, 2/3, added Reviewed-by in comments.

Change in patch 3/3, amdgpu_cs_submit, amdgpu_cs_ioctl return -EAGAIN
to user space to retry cs_ioctl.

Regards,
Philip

Philip Yang (3):
  drm/amdgpu: use HMM mirror callback to replace mmu notifier v7
  drm/amdkfd: avoid HMM change cause circular lock dependency v2
  drm/amdgpu: replace get_user_pages with HMM address mirror helpers v8

 drivers/gpu/drm/amd/amdgpu/Kconfig|   6 +-
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 138 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 179 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 178 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   3 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  32 ++--
 12 files changed, 269 insertions(+), 387 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Kuehling, Felix
Are you sure about this? Typically 64-bit doorbells don't wrap around. But this 
one does. If the IH doorbell wraps around, there is no reason why it needs to 
be 64-bit, so I suspect it may still be a 32-bit doorbell.

AFAIK, not all doorbells on Vega10 are 64-bit. It depends on the IP block. 
Therefore, maybe some of your other renaming changes should be reconsidered if 
they assume that all doorbells are 64-bit. Maybe it makes more sense to think 
of 64-bit doorbells as using 2 doorbell indexes.

Regards,
  Felix


From: amd-gfx  on behalf of Zhao, Yong 

Sent: Wednesday, February 6, 2019 10:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong
Subject: [PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

Clearly, it should be a 64-bit doorbell operation.

Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 796004896661..36f0e3cada30 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
if (ih->use_doorbell) {
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
-   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
+   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
} else if (ih == >irq.ih) {
WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
} else if (ih == >irq.ih1) {
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v8

2019-02-06 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
from scratch, for kfd, restore worker is rescheduled to retry.

HMM simplify the CPU page table concurrent update check, so remove
guptasklock, mmu_invalidations, last_set_pages fields from
amdgpu_ttm_tt struct.

HMM does not pin the page (increase page ref count), so remove related
operations like release_pages(), put_page(), mark_page_dirty().

Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 138 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  25 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 178 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   3 +-
 9 files changed, 182 insertions(+), 278 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 0b31a1859023..0e1711a75b68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -61,7 +61,6 @@ struct kgd_mem {
 
atomic_t invalid;
struct amdkfd_process_info *process_info;
-   struct page **user_pages;
 
struct amdgpu_sync sync;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d7b10d79f1de..ae2d838d31ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
goto out;
}
 
-   /* If no restore worker is running concurrently, user_pages
-* should not be allocated
-*/
-   WARN(mem->user_pages, "Leaking user_pages array");
-
-   mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
-  sizeof(struct page *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!mem->user_pages) {
-   pr_err("%s: Failed to allocate pages array\n", __func__);
-   ret = -ENOMEM;
-   goto unregister_out;
-   }
-
-   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
+   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
-   goto free_out;
+   goto unregister_out;
}
 
-   amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
-
ret = amdgpu_bo_reserve(bo, true);
if (ret) {
pr_err("%s: Failed to reserve BO\n", __func__);
@@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
amdgpu_bo_unreserve(bo);
 
 release_out:
-   if (ret)
-   release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
-free_out:
-   kvfree(mem->user_pages);
-   mem->user_pages = NULL;
+   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
 unregister_out:
if (ret)
amdgpu_mn_unregister(bo);
@@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
 
amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
@@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
 
i = 0;
@@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
list_del(_list_entry->head);
mutex_unlock(_info->lock);
 
-   /* Free user pages if necessary */
-   if (mem->user_pages) {
-   pr_debug("%s: Freeing user_pages array\n", __func__);
-   if (mem->user_pages[0])
-   release_pages(mem->user_pages,
-   mem->bo->tbo.ttm->num_pages);
-   kvfree(mem->user_pages);
-   }
-
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, );
if (unlikely(ret))
return ret;
@@ -1855,25 +1824,11 @@ static int update_invalid_user_pages(struct 

Re: [PATCH v2 2/2] drm/amd/powerplay: add override pcie parameters for Vega20 (v2)

2019-02-06 Thread Huang, JinHuiEric
Reviewed-by: Eric Huang 

On 2019-02-05 4:37 p.m., Kasiviswanathan, Harish wrote:
> v2: Fix SMU message format
>  Send override message after SMU enable features
>
> Change-Id: Ib880c83bc7aa12be370cf6619acfe66e12664c9c
> Signed-off-by: Harish Kasiviswanathan 
> ---
>   drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 45 
> +-
>   1 file changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index da022ca..e1b1656 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -771,40 +771,49 @@ static int vega20_init_smc_table(struct pp_hwmgr *hwmgr)
>   return 0;
>   }
>   
> +/*
> + * Override PCIe link speed and link width for DPM Level 1. PPTable entries
> + * reflect the ASIC capabilities and not the system capabilities. For e.g.
> + * Vega20 board in a PCI Gen3 system. In this case, when SMU's tries to 
> switch
> + * to DPM1, it fails as system doesn't support Gen4.
> + */
>   static int vega20_override_pcie_parameters(struct pp_hwmgr *hwmgr)
>   {
>   struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr->adev);
> - uint32_t pcie_speed = 0, pcie_width = 0, pcie_arg;
> + uint32_t pcie_gen = 0, pcie_width = 0, smu_pcie_arg;
>   int ret;
>   
>   if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN4)
> - pcie_speed = 16;
> + pcie_gen = 3;
>   else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
> - pcie_speed = 8;
> + pcie_gen = 2;
>   else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
> - pcie_speed = 5;
> + pcie_gen = 1;
>   else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1)
> - pcie_speed = 2;
> + pcie_gen = 0;
>   
>   if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X32)
> - pcie_width = 32;
> + pcie_width = 7;
>   else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X16)
> - pcie_width = 16;
> + pcie_width = 6;
>   else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X12)
> - pcie_width = 12;
> + pcie_width = 5;
>   else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X8)
> - pcie_width = 8;
> - else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4)
>   pcie_width = 4;
> + else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4)
> + pcie_width = 3;
>   else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X2)
>   pcie_width = 2;
>   else if (adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X1)
>   pcie_width = 1;
>   
> - pcie_arg = pcie_width | (pcie_speed << 8);
> -
> + /* Bit 31:16: LCLK DPM level. 0 is DPM0, and 1 is DPM1
> +  * Bit 15:8:  PCIE GEN, 0 to 3 corresponds to GEN1 to GEN4
> +  * Bit 7:0:   PCIE lane width, 1 to 7 corresponds is x1 to x32
> +  */
> + smu_pcie_arg = (1 << 16) | (pcie_gen << 8) | pcie_width;
>   ret = smum_send_msg_to_smc_with_parameter(hwmgr,
> - PPSMC_MSG_OverridePcieParameters, pcie_arg);
> + PPSMC_MSG_OverridePcieParameters, smu_pcie_arg);
>   PP_ASSERT_WITH_CODE(!ret,
>   "[OverridePcieParameters] Attempt to override pcie params 
> failed!",
>   return ret);
> @@ -1611,11 +1620,6 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr 
> *hwmgr)
>   "[EnableDPMTasks] Failed to initialize SMC table!",
>   return result);
>   
> - result = vega20_override_pcie_parameters(hwmgr);
> - PP_ASSERT_WITH_CODE(!result,
> - "[EnableDPMTasks] Failed to override pcie parameters!",
> - return result);
> -
>   result = vega20_run_btc(hwmgr);
>   PP_ASSERT_WITH_CODE(!result,
>   "[EnableDPMTasks] Failed to run btc!",
> @@ -1631,6 +1635,11 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr 
> *hwmgr)
>   "[EnableDPMTasks] Failed to enable all smu features!",
>   return result);
>   
> + result = vega20_override_pcie_parameters(hwmgr);
> + PP_ASSERT_WITH_CODE(!result,
> + "[EnableDPMTasks] Failed to override pcie parameters!",
> + return result);
> +
>   result = vega20_notify_smc_display_change(hwmgr);
>   PP_ASSERT_WITH_CODE(!result,
>   "[EnableDPMTasks] Failed to notify smc display change!",
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7

2019-02-06 Thread Koenig, Christian
Am 06.02.19 um 16:53 schrieb Yang, Philip:
>   >> +/* If userptr are updated after amdgpu_cs_parser_bos(), restart
>   >> cs */
>   >>   amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>   >>   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>   >> -if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
>   >> -r = -ERESTARTSYS;
>   >> -goto error_abort;
>   >> -}
>   >> +r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>   >
>   > Just abort when you see the first one with a problem here.
>   >
>   > There is no value in checking all of them.
>   >
> No, amdgpu_ttm_tt_get_user_pages_done calls hmm_vma_range_done to stop
> HMM track and free range structure memory, this needs go through all
> userptr BOs.

Well that actually sounds like an ugliness in the hmm_vma_range_done 
interface.

Need to double check the code and maybe sync up with Jerome if that is 
really a good idea.

But that won't affect you work, so feel free to go ahead for now.

>
>   >> +
>   >> +if (r == -EAGAIN) {
>   >> +if (!--tries) {
>   >> +DRM_ERROR("Possible deadlock? Retry too many times\n");
>   >> +return -EDEADLK;
>   >> +}
>   >> +goto restart;
>   >> +}
>   >> +
>   >
>   > I would still say to better just return to userspace here.
>   >
>   > Because of the way HMM works 10 retries might not be sufficient any more.
>   >
> Yes, it looks better to handle retry from user space. The extra sys call
> overhead can be ignored because this does not happen all the time. I
> will submit new patch for review.

Thanks, apart from the issues mentioned so far this looks good to me.

Feel free to add an Acked-by: Christian König  
to the patch.

I still need to double check if we don't have any locking problems 
(inversions, missing locks etc...). When this is done I can also give 
you and rb for the patch.

Christian.

>
> Thanks,
> Philip
>
> On 2019-02-06 4:20 a.m., Christian König wrote:
>> Am 05.02.19 um 23:00 schrieb Yang, Philip:
>>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>>> userptr and start CPU page table update track of those pages. Then use
>>> hmm_vma_range_done() to check if those pages are updated before
>>> amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
>>>
>>> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
>>> from scratch, for kfd, restore worker is rescheduled to retry.
>>>
>>> HMM simplify the CPU page table concurrent update check, so remove
>>> guptasklock, mmu_invalidations, last_set_pages fields from
>>> amdgpu_ttm_tt struct.
>>>
>>> HMM does not pin the page (increase page ref count), so remove related
>>> operations like release_pages(), put_page(), mark_page_dirty().
>>>
>>> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
>>> Signed-off-by: Philip Yang 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   1 -
>>>    .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c    | 158 +++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c    |  25 ++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h    |   4 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 178 +++---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   3 +-
>>>    9 files changed, 198 insertions(+), 282 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> index 0b31a1859023..0e1711a75b68 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> @@ -61,7 +61,6 @@ struct kgd_mem {
>>>    atomic_t invalid;
>>>    struct amdkfd_process_info *process_info;
>>> -    struct page **user_pages;
>>>    struct amdgpu_sync sync;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index d7b10d79f1de..ae2d838d31ea 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem,
>>> struct mm_struct *mm,
>>>    goto out;
>>>    }
>>> -    /* If no restore worker is running concurrently, user_pages
>>> - * should not be allocated
>>> - */
>>> -    WARN(mem->user_pages, "Leaking user_pages array");
>>> -
>>> -    mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
>>> -   sizeof(struct page *),
>>> -   GFP_KERNEL | __GFP_ZERO);
>>> -    if (!mem->user_pages) {
>>> -    pr_err("%s: Failed to allocate pages array\n", __func__);
>>> -    ret = -ENOMEM;
>>> -    goto unregister_out;
>>> -    }
>>> -
>>> -    ret 

Re: [PATCH 1/3] drm/amdgpu: Improve doorbell variable names

2019-02-06 Thread Zhao, Yong
Hi Alex,

I only add the corresponding changes for IH, but not for GFX as GFX does not 
use the similar range functions.

Regards,
Yong

From: Zhao, Yong
Sent: Wednesday, February 6, 2019 10:49 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong
Subject: [PATCH 1/3] drm/amdgpu: Improve doorbell variable names

Indicate that the doorbell offset and range is in dwords.

Change-Id: Ib0f2564ffa7b1940ffb8725cdc03f662184f5436
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h   |  2 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  3 ++-
 drivers/gpu/drm/amd/amdgpu/soc15.c   |  4 ++--
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c   |  6 +++---
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c |  6 +-
 12 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d67f8b1dfe80..88b3bbcea756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -642,13 +642,13 @@ struct amdgpu_nbio_funcs {
 void (*hdp_flush)(struct amdgpu_device *adev, struct amdgpu_ring 
*ring);
 u32 (*get_memsize)(struct amdgpu_device *adev);
 void (*sdma_doorbell_range)(struct amdgpu_device *adev, int instance,
-   bool use_doorbell, int doorbell_index, int 
doorbell_size);
+   bool use_doorbell, int index_in_dw, int range_dw_size);
 void (*enable_doorbell_aperture)(struct amdgpu_device *adev,
  bool enable);
 void (*enable_doorbell_selfring_aperture)(struct amdgpu_device *adev,
   bool enable);
 void (*ih_doorbell_range)(struct amdgpu_device *adev,
- bool use_doorbell, int doorbell_index);
+ bool use_doorbell, int index_in_dw);
 void (*update_medium_grain_clock_gating)(struct amdgpu_device *adev,
  bool enable);
 void (*update_medium_grain_light_sleep)(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 1cfec06f81d4..5c8d04c353d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -39,6 +39,7 @@ struct amdgpu_doorbell {
  * can be 64-bit, so the index defined is in qword.
  */
 struct amdgpu_doorbell_index {
+   uint32_t entry_dw_size;
 uint32_t kiq;
 uint32_t mec_ring0;
 uint32_t mec_ring1;
@@ -73,7 +74,7 @@ struct amdgpu_doorbell_index {
 };
 uint32_t max_assignment;
 /* Per engine SDMA doorbell size in dword */
-   uint32_t sdma_doorbell_range;
+   uint32_t dw_range_per_sdma_eng;
 };

 typedef enum _AMDGPU_DOORBELL_ASSIGNMENT
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index 1ccb1831382a..2572191b394a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -33,7 +33,7 @@ struct amdgpu_iv_entry;
 struct amdgpu_ih_ring {
 unsignedring_size;
 uint32_tptr_mask;
-   u32 doorbell_index;
+   u32 doorbell_idx_in_dw;
 booluse_doorbell;
 booluse_bus_addr;

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index cc967dbfd631..bcc41c957b24 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -68,7 +68,7 @@ static u32 nbio_v6_1_get_memsize(struct amdgpu_device *adev)
 }

 static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int 
instance,
-   bool use_doorbell, int doorbell_index, int 
doorbell_size)
+   bool use_doorbell, int index_in_dw, int range_dw_size)
 {
 u32 reg = instance == 0 ? SOC15_REG_OFFSET(NBIO, 0, 
mmBIF_SDMA0_DOORBELL_RANGE) :
 SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA1_DOORBELL_RANGE);
@@ -76,8 +76,8 @@ static void nbio_v6_1_sdma_doorbell_range(struct 
amdgpu_device *adev, int instan
 u32 doorbell_range = RREG32(reg);

 if (use_doorbell) {
-   doorbell_range = REG_SET_FIELD(doorbell_range, 
BIF_SDMA0_DOORBELL_RANGE, OFFSET, doorbell_index);
-   doorbell_range = REG_SET_FIELD(doorbell_range, 

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7

2019-02-06 Thread Yang, Philip
 >> +/* If userptr are updated after amdgpu_cs_parser_bos(), restart
 >> cs */
 >>   amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
 >>   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 >> -if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
 >> -r = -ERESTARTSYS;
 >> -goto error_abort;
 >> -}
 >> +r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
 >
 > Just abort when you see the first one with a problem here.
 >
 > There is no value in checking all of them.
 >
No, amdgpu_ttm_tt_get_user_pages_done calls hmm_vma_range_done to stop 
HMM track and free range structure memory, this needs go through all 
userptr BOs.

 >> +
 >> +if (r == -EAGAIN) {
 >> +if (!--tries) {
 >> +DRM_ERROR("Possible deadlock? Retry too many times\n");
 >> +return -EDEADLK;
 >> +}
 >> +goto restart;
 >> +}
 >> +
 >
 > I would still say to better just return to userspace here.
 >
 > Because of the way HMM works 10 retries might not be sufficient any more.
 >
Yes, it looks better to handle retry from user space. The extra sys call 
overhead can be ignored because this does not happen all the time. I 
will submit new patch for review.

Thanks,
Philip

On 2019-02-06 4:20 a.m., Christian König wrote:
> Am 05.02.19 um 23:00 schrieb Yang, Philip:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>> hmm_vma_range_done() to check if those pages are updated before
>> amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
>>
>> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
>> from scratch, for kfd, restore worker is rescheduled to retry.
>>
>> HMM simplify the CPU page table concurrent update check, so remove
>> guptasklock, mmu_invalidations, last_set_pages fields from
>> amdgpu_ttm_tt struct.
>>
>> HMM does not pin the page (increase page ref count), so remove related
>> operations like release_pages(), put_page(), mark_page_dirty().
>>
>> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
>> Signed-off-by: Philip Yang 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   1 -
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c    | 158 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c    |  25 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h    |   4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 178 +++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   3 +-
>>   9 files changed, 198 insertions(+), 282 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 0b31a1859023..0e1711a75b68 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -61,7 +61,6 @@ struct kgd_mem {
>>   atomic_t invalid;
>>   struct amdkfd_process_info *process_info;
>> -    struct page **user_pages;
>>   struct amdgpu_sync sync;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index d7b10d79f1de..ae2d838d31ea 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, 
>> struct mm_struct *mm,
>>   goto out;
>>   }
>> -    /* If no restore worker is running concurrently, user_pages
>> - * should not be allocated
>> - */
>> -    WARN(mem->user_pages, "Leaking user_pages array");
>> -
>> -    mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
>> -   sizeof(struct page *),
>> -   GFP_KERNEL | __GFP_ZERO);
>> -    if (!mem->user_pages) {
>> -    pr_err("%s: Failed to allocate pages array\n", __func__);
>> -    ret = -ENOMEM;
>> -    goto unregister_out;
>> -    }
>> -
>> -    ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
>> +    ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
>>   if (ret) {
>>   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
>> -    goto free_out;
>> +    goto unregister_out;
>>   }
>> -    amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
>> -
>>   ret = amdgpu_bo_reserve(bo, true);
>>   if (ret) {
>>   pr_err("%s: Failed to reserve BO\n", __func__);
>> @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, 
>> struct mm_struct *mm,
>>   amdgpu_bo_unreserve(bo);
>>   release_out:
>> -    if (ret)
>> -    release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
>> -free_out:
>> -    

[PATCH 3/3] drm/amdgpu: Add dword term to the doorbell index in rings

2019-02-06 Thread Zhao, Yong
When doorbells are 8-byte on SOC15, doorbell_index in rings no longer
reflects its true usage. So we should indicate its dword attribute as
a generic way to accommodate different doorbell sizes.

Change-Id: I053c69498af5d68df1783a7eb03106dd68f5e8cc
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 16 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   |  6 +++---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 16 
 drivers/gpu/drm/amd/amdgpu/soc15.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c|  8 
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c|  8 
 11 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 97a60da62004..6389ef068f4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -250,7 +250,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
ring->adev = NULL;
ring->ring_obj = NULL;
ring->use_doorbell = true;
-   ring->doorbell_index = adev->doorbell_index.kiq;
+   ring->doorbell_dw_idx = adev->doorbell_index.kiq;
 
r = amdgpu_gfx_kiq_acquire(adev, ring);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index d7fae2676269..0c6fa7576a9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -198,7 +198,7 @@ struct amdgpu_ring {
uint64_tmqd_gpu_addr;
void*mqd_ptr;
uint64_teop_gpu_addr;
-   u32 doorbell_index;
+   u32 doorbell_dw_idx;
booluse_doorbell;
booluse_pollmem;
unsignedwptr_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 305276c7e4bf..a6aff3f9ab9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -3109,7 +3109,7 @@ static int gfx_v6_0_sw_init(void *handle)
ring = >gfx.compute_ring[i];
ring->ring_obj = NULL;
ring->use_doorbell = false;
-   ring->doorbell_index = 0;
+   ring->doorbell_dw_idx = 0;
ring->me = 1;
ring->pipe = i;
ring->queue = i;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 7984292f9282..3f11f0ac43fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2641,7 +2641,7 @@ static void gfx_v7_0_ring_set_wptr_compute(struct 
amdgpu_ring *ring)
 
/* XXX check if swapping is necessary on BE */
adev->wb.wb[ring->wptr_offs] = lower_32_bits(ring->wptr);
-   WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr));
+   WDOORBELL32(ring->doorbell_dw_idx, lower_32_bits(ring->wptr));
 }
 
 /**
@@ -2957,7 +2957,7 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
mqd->cp_hqd_pq_doorbell_control &=
~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK;
mqd->cp_hqd_pq_doorbell_control |=
-   (ring->doorbell_index <<
+   (ring->doorbell_dw_idx <<
 CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT);
mqd->cp_hqd_pq_doorbell_control |=
CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
@@ -4363,7 +4363,7 @@ static int gfx_v7_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
 
ring->ring_obj = NULL;
ring->use_doorbell = true;
-   ring->doorbell_index = adev->doorbell_index.mec_ring0 + ring_id;
+   ring->doorbell_dw_idx = adev->doorbell_index.mec_ring0 + ring_id;
sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
 
irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a26747681ed6..a2ae1446e7a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1890,7 +1890,7 @@ static int gfx_v8_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
 
ring->ring_obj = NULL;
ring->use_doorbell = true;
-   ring->doorbell_index = adev->doorbell_index.mec_ring0 + ring_id;
+   ring->doorbell_dw_idx = adev->doorbell_index.mec_ring0 + ring_id;
ring->eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr
+ (ring_id * 

[PATCH 1/3] drm/amdgpu: Improve doorbell variable names

2019-02-06 Thread Zhao, Yong
Indicate that the doorbell offset and range is in dwords.

Change-Id: Ib0f2564ffa7b1940ffb8725cdc03f662184f5436
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h   |  2 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  3 ++-
 drivers/gpu/drm/amd/amdgpu/soc15.c   |  4 ++--
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c   |  6 +++---
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c |  6 +-
 12 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d67f8b1dfe80..88b3bbcea756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -642,13 +642,13 @@ struct amdgpu_nbio_funcs {
void (*hdp_flush)(struct amdgpu_device *adev, struct amdgpu_ring *ring);
u32 (*get_memsize)(struct amdgpu_device *adev);
void (*sdma_doorbell_range)(struct amdgpu_device *adev, int instance,
-   bool use_doorbell, int doorbell_index, int 
doorbell_size);
+   bool use_doorbell, int index_in_dw, int range_dw_size);
void (*enable_doorbell_aperture)(struct amdgpu_device *adev,
 bool enable);
void (*enable_doorbell_selfring_aperture)(struct amdgpu_device *adev,
  bool enable);
void (*ih_doorbell_range)(struct amdgpu_device *adev,
- bool use_doorbell, int doorbell_index);
+ bool use_doorbell, int index_in_dw);
void (*update_medium_grain_clock_gating)(struct amdgpu_device *adev,
 bool enable);
void (*update_medium_grain_light_sleep)(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 1cfec06f81d4..5c8d04c353d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -39,6 +39,7 @@ struct amdgpu_doorbell {
  * can be 64-bit, so the index defined is in qword.
  */
 struct amdgpu_doorbell_index {
+   uint32_t entry_dw_size;
uint32_t kiq;
uint32_t mec_ring0;
uint32_t mec_ring1;
@@ -73,7 +74,7 @@ struct amdgpu_doorbell_index {
};
uint32_t max_assignment;
/* Per engine SDMA doorbell size in dword */
-   uint32_t sdma_doorbell_range;
+   uint32_t dw_range_per_sdma_eng;
 };
 
 typedef enum _AMDGPU_DOORBELL_ASSIGNMENT
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index 1ccb1831382a..2572191b394a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -33,7 +33,7 @@ struct amdgpu_iv_entry;
 struct amdgpu_ih_ring {
unsignedring_size;
uint32_tptr_mask;
-   u32 doorbell_index;
+   u32 doorbell_idx_in_dw;
booluse_doorbell;
booluse_bus_addr;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index cc967dbfd631..bcc41c957b24 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -68,7 +68,7 @@ static u32 nbio_v6_1_get_memsize(struct amdgpu_device *adev)
 }
 
 static void nbio_v6_1_sdma_doorbell_range(struct amdgpu_device *adev, int 
instance,
-   bool use_doorbell, int doorbell_index, int 
doorbell_size)
+   bool use_doorbell, int index_in_dw, int range_dw_size)
 {
u32 reg = instance == 0 ? SOC15_REG_OFFSET(NBIO, 0, 
mmBIF_SDMA0_DOORBELL_RANGE) :
SOC15_REG_OFFSET(NBIO, 0, mmBIF_SDMA1_DOORBELL_RANGE);
@@ -76,8 +76,8 @@ static void nbio_v6_1_sdma_doorbell_range(struct 
amdgpu_device *adev, int instan
u32 doorbell_range = RREG32(reg);
 
if (use_doorbell) {
-   doorbell_range = REG_SET_FIELD(doorbell_range, 
BIF_SDMA0_DOORBELL_RANGE, OFFSET, doorbell_index);
-   doorbell_range = REG_SET_FIELD(doorbell_range, 
BIF_SDMA0_DOORBELL_RANGE, SIZE, doorbell_size);
+   doorbell_range = REG_SET_FIELD(doorbell_range, 
BIF_SDMA0_DOORBELL_RANGE, OFFSET, index_in_dw);
+   doorbell_range = REG_SET_FIELD(doorbell_range, 
BIF_SDMA0_DOORBELL_RANGE, SIZE, range_dw_size);
} else
doorbell_range = REG_SET_FIELD(doorbell_range, 

[PATCH 2/3] drm/amdgpu: Fix a typo in vega10_ih_set_rptr()

2019-02-06 Thread Zhao, Yong
Clearly, it should be a 64-bit doorbell operation.

Change-Id: I644a2ebcb18c2ede24ee15692a6189efad10a35c
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 796004896661..36f0e3cada30 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -377,7 +377,7 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
if (ih->use_doorbell) {
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
-   WDOORBELL32(ih->doorbell_idx_in_dw, ih->rptr);
+   WDOORBELL64(ih->doorbell_idx_in_dw, ih->rptr);
} else if (ih == >irq.ih) {
WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
} else if (ih == >irq.ih1) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()

2019-02-06 Thread Daniel Vetter
On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 05.02.2019 17.31, skrev Daniel Vetter:
> > On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 05.02.2019 10.11, skrev Daniel Vetter:
> >>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
> 
> 
>  Den 04.02.2019 16.41, skrev Daniel Vetter:
> > On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
> >> The only thing now that makes drm_dev_unplug() special is that it sets
> >> drm_device->unplugged. Move this code to drm_dev_unregister() so that 
> >> we
> >> can remove drm_dev_unplug().
> >>
> >> Signed-off-by: Noralf Trønnes 
> >> ---
> 
>  [...]
> 
> >>  drivers/gpu/drm/drm_drv.c | 27 +++
> >>  include/drm/drm_drv.h | 10 --
> >>  2 files changed, 19 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >> index 05bbc2b622fc..e0941200edc6 100644
> >> --- a/drivers/gpu/drm/drm_drv.c
> >> +++ b/drivers/gpu/drm/drm_drv.c
> >> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
> >>   */
> >>  void drm_dev_unplug(struct drm_device *dev)
> >>  {
> >> -  /*
> >> -   * After synchronizing any critical read section is guaranteed 
> >> to see
> >> -   * the new value of ->unplugged, and any critical section which 
> >> might
> >> -   * still have seen the old value of ->unplugged is guaranteed 
> >> to have
> >> -   * finished.
> >> -   */
> >> -  dev->unplugged = true;
> >> -  synchronize_srcu(_unplug_srcu);
> >> -
> >>drm_dev_unregister(dev);
> >>drm_dev_put(dev);
> >>  }
> >> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
> >>   * drm_dev_register() but does not deallocate the device. The caller 
> >> must call
> >>   * drm_dev_put() to drop their final reference.
> >>   *
> >> - * A special form of unregistering for hotpluggable devices is 
> >> drm_dev_unplug(),
> >> - * which can be called while there are still open users of @dev.
> >> + * This function can be called while there are still open users of 
> >> @dev as long
> >> + * as the driver protects its device resources using drm_dev_enter() 
> >> and
> >> + * drm_dev_exit().
> >>   *
> >>   * This should be called first in the device teardown code to make 
> >> sure
> >> - * userspace can't access the device instance any more.
> >> + * userspace can't access the device instance any more. Drivers that 
> >> support
> >> + * device unplug will probably want to call 
> >> drm_atomic_helper_shutdown() first
> >
> > Read once more with a bit more coffee, spotted this:
> >
> > s/first/afterwards/ - shutting down the hw before we've taken it away 
> > from
> > userspace is kinda the wrong way round. It should be the inverse of 
> > driver
> > load, which is 1) allocate structures 2) prep hw 3) register driver with
> > the world (simplified ofc).
> >
> 
>  The problem is that drm_dev_unregister() sets the device as unplugged
>  and if drm_atomic_helper_shutdown() is called afterwards it's not
>  allowed to touch hardware.
> 
>  I know it's the wrong order, but the only way to do it in the right
>  order is to have a separate function that sets unplugged:
> 
>   drm_dev_unregister();
>   drm_atomic_helper_shutdown();
>   drm_dev_set_unplugged();
> >>>
> >>> Annoying ... but yeah calling _shutdown() before we stopped userspace is
> >>> also not going to work. Because userspace could quickly re-enable
> >>> something, and then the refcounts would be all wrong again and leaking
> >>> objects.
> >>>
> >>
> >> What happens with a USB device that is unplugged with open userspace,
> >> will that leak objects?
> > 
> > Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
> > as normal, the only thing that should be skipped is actually touching the
> > hw (as long as the driver doesn't protect too much with
> > drm_dev_enter/exit). So all the software updates (including refcounting
> > updates) will still be done. Ofc current udl is not yet atomic, so in
> > reality something else happens.
> > 
> > And we ofc still have the same issue that if you just unload the driver,
> > then the hw will stay on (which might really confuse the driver on next
> > load, when it assumes that it only gets loaded from cold boot where
> > everything is off - which usually is the case on an arm soc at least).
> > 
> >>> I get a bit the feeling we're over-optimizing here with trying to devm-ize
> >>> drm_dev_register. Just getting drm_device correctly devm-ized is a big
> >>> step forward already, and will open up a lot of TODO items across a lot of
> >>> drivers. E.g. we 

RE: [PATCH 2/5] drm/amdgpu: Fix a bug in setting CP_MEC_DOORBELL_RANGE_UPPER on SOC15

2019-02-06 Thread Zeng, Oak
+ Clint/Alice

Hi Clint,

We think CP_MEC_DOORBELL_RANGE_LOWER/UPPER registers are used by MEC to check 
whether a doorbell routed to MEC belongs to MEC, is this understanding correct? 

From the register spec, those registers are 27 bits. Does this mean MEC use all 
27 bits to determine? For example, if we set lower/upper to [0, 4k], will a 
doorbell ring at 6K address be ignored by MEC?

Thanks,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Zhao, Yong
Sent: Tuesday, February 5, 2019 3:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong 
Subject: [PATCH 2/5] drm/amdgpu: Fix a bug in setting 
CP_MEC_DOORBELL_RANGE_UPPER on SOC15

Because CP can use all doorbells outside the ones reserved for SDMA, IH, and 
VCN, so the value set to CP_MEC_DOORBELL_RANGE_UPPER should be the maximal 
index possible in a page.

Change-Id: I402a56ce9a80e6c2ed2f96be431ae71ca88e73a4
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c | 3 +++  
drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 3 +++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 5c8d04c353d0..90eca63605ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -73,6 +73,7 @@ struct amdgpu_doorbell_index {
} uvd_vce;
};
uint32_t max_assignment;
+   uint32_t last_idx;
/* Per engine SDMA doorbell size in dword */
uint32_t dw_range_per_sdma_eng;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 262ee3cf6f1c..0278e3ab6b94 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2998,7 +2998,7 @@ static int gfx_v9_0_kiq_init_register(struct amdgpu_ring 
*ring)
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER,
(adev->doorbell_index.kiq * 2) << 2);
WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER,
-   (adev->doorbell_index.userqueue_end * 
2) << 2);
+   (adev->doorbell_index.last_idx * 2) << 2);
}
 
WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL, diff --git 
a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
index d2409df2dde9..9eb8c9209231 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c
@@ -88,5 +88,8 @@ void vega10_doorbell_index_init(struct amdgpu_device *adev)
(adev->doorbell_index.sdma_engine[1]
- adev->doorbell_index.sdma_engine[0])
* adev->doorbell_index.entry_dw_size;
+
+   adev->doorbell_index.last_idx = PAGE_SIZE
+   / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
index b28c5999d8f0..aa8c7699c689 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
@@ -91,5 +91,8 @@ void vega20_doorbell_index_init(struct amdgpu_device *adev)
(adev->doorbell_index.sdma_engine[1]
- adev->doorbell_index.sdma_engine[0])
* adev->doorbell_index.entry_dw_size;
+
+   adev->doorbell_index.last_idx = PAGE_SIZE
+   / (sizeof(uint32_t) * adev->doorbell_index.entry_dw_size) - 1;
 }
 
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


BUG - unable to handle null pointer, bisected - drm/amd/display: add gpio lock/unlock

2019-02-06 Thread Przemek Socha
Good morning,

on my Lenovo G50-45 a6310 APU with R4 Mullins commit 
e261568f94d6c37ebb94d3c4b3f8a3085375dd9d is causing kernel Oops (unable to 
handle NULL pointer).
Cross-checked by reverting troublesome commit and machine without it is 
working fine.

Here is a part of the Oops message from pstore:


<1>[   13.200310] BUG: unable to handle kernel NULL pointer dereference at 
0008
<1>[   13.200323] #PF error: [normal kernel read fault]
<6>[   13.200328] PGD 0 P4D 0 
<4>[   13.200335] Oops:  [#1] PREEMPT SMP
<4>[   13.200342] CPU: 2 PID: 2961 Comm: udevd Not tainted 5.0.0-rc1+ #47
<4>[   13.200347] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 
08/04/2016
<4>[   13.200450] RIP: 0010:dal_gpio_open_ex+0x0/0x30 [amdgpu]
<4>[   13.200456] Code: d6 48 89 de 48 89 ef e8 6e f8 ff ff 84 c0 74 c7 48 89 
e8 
5b 5d c3 0f 0b 31 ed 5b 48 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 <48> 83 
7f 08 00 74 08 0f 0b b8 05 00 00 00 c3 89 77 18 8b 57 14 4c
<4>[   13.200466] RSP: 0018:b78e82bb7650 EFLAGS: 00010282
<4>[   13.200471] RAX:  RBX: b78e82bb76a4 RCX: 

<4>[   13.200476] RDX: 0006 RSI: 0004 RDI: 

<4>[   13.200480] RBP: a1d695e93300 R08: 0003 R09: 
a1d692456600
<4>[   13.200485] R10: f7dc88574dc0 R11: b78e82bb75b8 R12: 
a1d695c68700
<4>[   13.200490] R13: c07ef5a0 R14: b78e82bb79b8 R15: 
a1d692456600
<4>[   13.200495] FS:  7f9c3fcac300() GS:a1d697b0() knlGS:

<4>[   13.200501] CS:  0010 DS:  ES:  CR0: 80050033
<4>[   13.200506] CR2: 0008 CR3: 0002124a CR4: 
000406e0
<4>[   13.200510] Call Trace:
<4>[   13.200605]  construct+0x15f/0x710 [amdgpu]
<4>[   13.200710]  link_create+0x2e/0x48 [amdgpu]
<4>[   13.200803]  dc_create+0x2c0/0x5f0 [amdgpu]
<4>[   13.200899]  dm_hw_init+0xe0/0x150 [amdgpu]
<4>[   13.200990]  amdgpu_device_init.cold.38+0xe06/0xf67 [amdgpu]
<4>[   13.201002]  ? kmalloc_order+0x13/0x38
<4>[   13.201102]  amdgpu_driver_load_kms+0x60/0x210 [amdgpu]
<4>[   13.201112]  drm_dev_register+0x10e/0x150
<4>[   13.201207]  amdgpu_pci_probe+0xb8/0x118 [amdgpu]
<4>[   13.201217]  ? _raw_spin_unlock_irqrestore+0xf/0x28
<4>[   13.201226]  pci_device_probe+0xd1/0x158
<4>[   13.201234]  really_probe+0xee/0x2a0
<4>[   13.201241]  driver_probe_device+0x4a/0xb0
<4>[   13.201247]  __driver_attach+0xaf/0xc8
<4>[   13.201253]  ? driver_probe_device+0xb0/0xb0
<4>[   13.201258]  bus_for_each_dev+0x6f/0xb8
<4>[   13.201265]  bus_add_driver+0x197/0x1d8
<4>[   13.201271]  ? 0xc0933000
<4>[   13.201276]  driver_register+0x66/0xa8
<4>[   13.201281]  ? 0xc0933000
<4>[   13.201287]  do_one_initcall+0x41/0x1e2
<4>[   13.201294]  ? wake_up_page_bit+0x21/0x100
<4>[   13.201301]  ? kmem_cache_alloc_trace+0x2e/0x1a0
<4>[   13.201308]  ? do_init_module+0x1d/0x1e0
<4>[   13.201315]  do_init_module+0x55/0x1e0
<4>[   13.201321]  load_module+0x205c/0x2488
<4>[   13.201329]  ? vfs_read+0x10e/0x138
<4>[   13.201337]  ? __do_sys_finit_module+0xba/0xd8
<4>[   13.201342]  __do_sys_finit_module+0xba/0xd8
<4>[   13.201350]  do_syscall_64+0x50/0x168
<4>[   13.201357]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[   13.201364] RIP: 0033:0x7f9c3fdcf409
<4>[   13.201371] Code: 18 c3 e8 3a 98 01 00 66 2e 0f 1f 84 00 00 00 00 00 48 
89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 
3d 01 f0 ff ff 73 01 c3 48 8b 0d 47 6a 0c 00 f7 d8 64 89 01 48
<4>[   13.201381] RSP: 002b:7fff9b4824f8 EFLAGS: 0246 ORIG_RAX: 
0139
<4>[   13.201389] RAX: ffda RBX: 559d56fe1780 RCX: 
7f9c3fdcf409
<4>[   13.201394] RDX:  RSI: 559d570385c0 RDI: 
000e
<4>[   13.201399] RBP:  R08:  R09: 
7fff9b482610
<4>[   13.201404] R10: 000e R11: 0246 R12: 
559d56ff2120
<4>[   13.201409] R13: 0002 R14: 559d570385c0 R15: 
559d56fe1780
<4>[   13.201416] Modules linked in: kvm_amd kvm ath9k irqbypass crc32_pclmul 
ghash_clmulni_intel serio_raw ath9k_common ath9k_hw sdhci_pci cqhci sdhci 
amdgpu(+) mmc_core mac80211 ath mfd_core chash cfg80211 gpu_sched ttm xhci_pci 
ehci_pci xhci_hcd ehci_hcd sp5100_tco
<4>[   13.201448] CR2: 0008
<4>[   13.206222] ---[ end trace 2244da3024c5ad93 ]---


Here is a full git bisect log on amd-staging-drm-next branch synced today:

git bisect start
# good: [e1be4cb583800db36ed7f6303f7a8c205be24ceb] drm/amd/display: Use memset 
to initialize variables in fill_plane_dcc_attributes
git bisect good e1be4cb583800db36ed7f6303f7a8c205be24ceb
# bad: [25fa5507b06b8cfbec6db7933615ae603516bb7b] drm/amd/display: Disconnect 
mpcc when changing tg
git bisect bad 25fa5507b06b8cfbec6db7933615ae603516bb7b
# good: [e7b4cc9edcbe9c07e5bae2dbdebb04b054e3ff5b] drm/amd/display: Remove 
FreeSync timing changed debug output
git bisect good 

Re: [PATCH v2 1/2] drm/amdgpu: Fix pci platform speed and width

2019-02-06 Thread Christian König

Am 06.02.19 um 02:13 schrieb Alex Deucher:

On Tue, Feb 5, 2019 at 4:38 PM Kasiviswanathan, Harish
 wrote:

The new Vega series GPU cards have in-built bridges. To get the pcie
speed and width supported by the platform walk the hierarchy and get the
slowest link.

Change-Id: I3196d158b0c614cbb5d7a34c793a58cb95322d32
Signed-off-by: Harish Kasiviswanathan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 58 +++---
  1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d7dddb9..fcab1fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3618,6 +3618,38 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 return r;
  }

+static void amdgpu_device_get_min_pci_speed_width(struct amdgpu_device *adev,
+ enum pci_bus_speed *speed,
+ enum pcie_link_width *width)
+{
+   struct pci_dev *pdev = adev->pdev;
+   enum pci_bus_speed cur_speed;
+   enum pcie_link_width cur_width;
+
+   *speed = PCI_SPEED_UNKNOWN;
+   *width = PCIE_LNK_WIDTH_UNKNOWN;
+
+   while (pdev) {
+   cur_speed = pcie_get_speed_cap(pdev);
+   cur_width = pcie_get_width_cap(pdev);
+
+   if (cur_speed != PCI_SPEED_UNKNOWN) {
+   if (*speed == PCI_SPEED_UNKNOWN)
+   *speed = cur_speed;
+   else if (cur_speed < *speed)
+   *speed = cur_speed;
+   }
+
+   if (cur_width != PCIE_LNK_WIDTH_UNKNOWN) {
+   if (*width == PCIE_LNK_WIDTH_UNKNOWN)
+   *width = cur_width;
+   else if (cur_width < *width)
+   *width = cur_width;
+   }
+   pdev = pci_upstream_bridge(pdev);
+   }
+}
+


That should probably rather be some helper function in the PCIe subsystem.

But for now it should be ok to have it here instead. Feel free to add an 
Acked-by: Christian König  as well.


Christian.


  /**
   * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
   *
@@ -3630,8 +3662,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev)
  {
 struct pci_dev *pdev;
-   enum pci_bus_speed speed_cap;
-   enum pcie_link_width link_width;
+   enum pci_bus_speed speed_cap, platform_speed_cap;
+   enum pcie_link_width platform_link_width;

 if (amdgpu_pcie_gen_cap)
 adev->pm.pcie_gen_mask = amdgpu_pcie_gen_cap;
@@ -3648,6 +3680,12 @@ static void amdgpu_device_get_pcie_info(struct 
amdgpu_device *adev)
 return;
 }

+   if (adev->pm.pcie_gen_mask && adev->pm.pcie_mlw_mask)
+   return;

I think you can drop this check, but I guess it doesn't hurt.  Either
way, series is:
Reviewed-by: Alex Deucher 


+
+   amdgpu_device_get_min_pci_speed_width(adev, _speed_cap,
+ _link_width);
+
 if (adev->pm.pcie_gen_mask == 0) {
 /* asic caps */
 pdev = adev->pdev;
@@ -3673,22 +3711,20 @@ static void amdgpu_device_get_pcie_info(struct 
amdgpu_device *adev)
 adev->pm.pcie_gen_mask |= 
CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN1;
 }
 /* platform caps */
-   pdev = adev->ddev->pdev->bus->self;
-   speed_cap = pcie_get_speed_cap(pdev);
-   if (speed_cap == PCI_SPEED_UNKNOWN) {
+   if (platform_speed_cap == PCI_SPEED_UNKNOWN) {
 adev->pm.pcie_gen_mask |= 
(CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 |

CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2);
 } else {
-   if (speed_cap == PCIE_SPEED_16_0GT)
+   if (platform_speed_cap == PCIE_SPEED_16_0GT)
 adev->pm.pcie_gen_mask |= 
(CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 |

CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2 |

CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 |

CAIL_PCIE_LINK_SPEED_SUPPORT_GEN4);
-   else if (speed_cap == PCIE_SPEED_8_0GT)
+   else if (platform_speed_cap == PCIE_SPEED_8_0GT)
 adev->pm.pcie_gen_mask |= 
(CAIL_PCIE_LINK_SPEED_SUPPORT_GEN1 |

CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2 |


Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v7

2019-02-06 Thread Christian König

Am 05.02.19 um 23:00 schrieb Yang, Philip:

Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
from scratch, for kfd, restore worker is rescheduled to retry.

HMM simplify the CPU page table concurrent update check, so remove
guptasklock, mmu_invalidations, last_set_pages fields from
amdgpu_ttm_tt struct.

HMM does not pin the page (increase page ref count), so remove related
operations like release_pages(), put_page(), mark_page_dirty().

Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 158 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  25 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 178 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   3 +-
  9 files changed, 198 insertions(+), 282 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 0b31a1859023..0e1711a75b68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -61,7 +61,6 @@ struct kgd_mem {
  
  	atomic_t invalid;

struct amdkfd_process_info *process_info;
-   struct page **user_pages;
  
  	struct amdgpu_sync sync;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index d7b10d79f1de..ae2d838d31ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
goto out;
}
  
-	/* If no restore worker is running concurrently, user_pages

-* should not be allocated
-*/
-   WARN(mem->user_pages, "Leaking user_pages array");
-
-   mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
-  sizeof(struct page *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!mem->user_pages) {
-   pr_err("%s: Failed to allocate pages array\n", __func__);
-   ret = -ENOMEM;
-   goto unregister_out;
-   }
-
-   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
+   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
-   goto free_out;
+   goto unregister_out;
}
  
-	amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);

-
ret = amdgpu_bo_reserve(bo, true);
if (ret) {
pr_err("%s: Failed to reserve BO\n", __func__);
@@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
amdgpu_bo_unreserve(bo);
  
  release_out:

-   if (ret)
-   release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
-free_out:
-   kvfree(mem->user_pages);
-   mem->user_pages = NULL;
+   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
  unregister_out:
if (ret)
amdgpu_mn_unregister(bo);
@@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
  
  	amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);

@@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
  
  	i = 0;

@@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
list_del(_list_entry->head);
mutex_unlock(_info->lock);
  
-	/* Free user pages if necessary */

-   if (mem->user_pages) {
-   pr_debug("%s: Freeing user_pages array\n", __func__);
-   if (mem->user_pages[0])
-   release_pages(mem->user_pages,
-   mem->bo->tbo.ttm->num_pages);
-   kvfree(mem->user_pages);
-   }
-
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, );
if (unlikely(ret))
return ret;
@@ -1855,25 +1824,11 @@ static int 

Re: [PATCH 6/8] drm/amdgpu: allocate VM PDs/PTs on demand

2019-02-06 Thread Koenig, Christian
Am 06.02.19 um 00:51 schrieb Kuehling, Felix:
> On 2019-02-05 11:00 a.m., Koenig, Christian wrote:
>> Ah! The initial clear of the PDs is syncing to everything!
> Right. Using amdgpu_sync_resv(... AMDGPU_FENCE_OWNER_VM ...) in
> amdgpu_vm_clear_bo seems to help. But if I also change the
> amdgpu_job_submit to use AMDGPU_FENCE_OWNER_VM, I get a VM fault and CP
> hang very early in my test. Not sure what's happening there. This is
> what I changed:
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 8b240f9..06c28ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -826,17 +826,18 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device 
> *adev,
>
>   WARN_ON(job->ibs[0].length_dw > 64);
>   r = amdgpu_sync_resv(adev, >sync, bo->tbo.resv,
> -AMDGPU_FENCE_OWNER_UNDEFINED, false);
> +AMDGPU_FENCE_OWNER_VM, false);

That's correct.

>   if (r)
>   goto error_free;
>
> -   r = amdgpu_job_submit(job, >entity, AMDGPU_FENCE_OWNER_UNDEFINED,
> +   r = amdgpu_job_submit(job, >entity, AMDGPU_FENCE_OWNER_VM,

This change most likely doesn't make a difference.

> );
>   if (r)
>   goto error_free;
>
>   amdgpu_bo_fence(bo, fence, true);
> -   dma_fence_put(fence);
> +   dma_fence_put(vm->last_update);
> +   vm->last_update = fence;

This is most likely incorrect.

>
>   if (bo->shadow)
>   return amdgpu_vm_clear_bo(adev, vm, bo->shadow,
>
> Basically I tried to do the synchronization exactly like
> amdgpu_vm_update_directories.
>
> The only way this clear could cause a VM fault is, if a subsequent
> PTE/PDS update happens too early and gets overwritten by the pending
> clear. But don't the clear and the next PTE/PDE update go to the same
> queue (vm->entity) and are implicitly synchronized?

Need to check where this is coming from.

> Answer after another hour of pouring over code and history: No, sched
> entities are no longer equivalent to queues. The scheduler can itself
> load-balance VM updates using multiple SDMA queues. Sigh.

That is unproblematic. Load balancing happens only when the entity is idle.

E.g. submissions to the same entity never run in parallel.

> OK, so page table updates to different PTEs don't usually need to
> synchronize with each other. But how do page table updates to the same
> address get synchronized? How do you guarantee that two updates of the
> same PTE are processed by the hardware in the correct order, if
> AMDGPU_FENCE_OWNER_VM fences don't synchronize with each other?

Mhm, need to double check but that explanation doesn't seem to fit. 
There must be something else going on.

>
>
>> Ok, that is easy to fix.
> Not that easy. See above. ;)

Well need to take a closer look why you run into VM faults, but 
basically replacing the parameter in amdgpu_sync_resv should already do 
the trick.

Regards,
Christian.

>
> Regards,
>     Felix
>
>
>
>> Christian.
>>
>> Am 05.02.19 um 16:49 schrieb Kuehling, Felix:
>>> I saw a backtrace from the GPU scheduler when I was debugging this 
>>> yesterday, but those backtraces never tell us where the command submission 
>>> came from. It could be memory initialization, or some migration at 
>>> buffer-validation. Basically, any command submission triggered by page 
>>> table allocation that synchronizes with the PD reservation object is a 
>>> problem.
>>>
>>> Regards,
>>>  Felix
>>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Tuesday, February 05, 2019 10:40 AM
>>> To: Kuehling, Felix ; Koenig, Christian 
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 6/8] drm/amdgpu: allocate VM PDs/PTs on demand
>>>
>>> Am 05.02.19 um 16:20 schrieb Kuehling, Felix:
>> This may cause regressions in KFD if PT BO allocation can trigger
>> eviction fences.
> I don't think that can happen, but need to double check as well.
>
> Otherwise allocating page tables could try to evict other page tables 
> from the same process and that seriously doesn't make much sense.
 Eviction fences don't cause actual memory evictions. They are the result 
 of memory evictions. They cause KFD processes to be preempted in order to 
 allow the fence to signal so memory can be evicted. The problem is that a 
 page table allocation waits for the fences on the PD reservation, which 
 looks like an eviction to KFD and triggers an unnecessary preemption. 
 There is no actual memory eviction happening here.
>>> Yeah, but where is that actually coming from?
>>>
>>> It sounds like we still unnecessary synchronize page table updates 
>>> somewhere.
>>>
>>> Do you have a call chain?
>>>
>>> Regards,
>>> Christian.
>>>
 Regards,
   Felix

 -Original Message-

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

2019-02-06 Thread Christian König

Am 05.02.19 um 22:56 schrieb Yang, Philip:

Hi Christian,

I will submit new patch for review, my comments embedded inline below.

Thanks,
Philip

On 2019-02-05 1:09 p.m., Koenig, Christian wrote:

Am 05.02.19 um 18:25 schrieb Yang, Philip:

[SNIP]+

+    if (r == -ERESTARTSYS) {
+    if (!--tries) {
+    DRM_ERROR("Possible deadlock? Retry too many times\n");
+    return -EDEADLK;
+    }
+    goto restart;

You really need to restart the IOCTL to potentially handle signals.

But since the calling code correctly handles this already you can just
drop this change as far as I can see.


I agree that we should return -ERESTARTSYS to upper layer to handle signals.

But I do not find upper layers handle -ERESTARTSYS in the entire calling
path, ksys_ioctl -> do_vfs_ioctl -> amdgpu_drm_ioctl -> drm->ioctl ->
drm_ioctl_kernel -> amdgpu_cs_ioctl. The error code returns to
application. I confirm it, libdrm userptr test application calling
amdgpu_cs_ioctl return code is -512, which is -ERESTARTSYS.

So application should handle -ERESTARTSYS to restart the ioctl, but
libdrm userptr test application doesn't handle this. This causes the
test failed.

This is a bug in the test cases then.

-ERESTARTSYS can happen at any time during interruptible waiting and it
is mandatory for the upper layer to handle it correctly.


-ERESTARTSYS can be returned only when signal is pending, signal handler
will translate ERESTARTSYS to EINTR, drmIoctl in libdrm does handle
EINTR and restart the ioctl. The test cases are ok.


Interesting, I thought that this was handled unconditionally if a signal 
is pending or not. But that could as well be architecture specific.



Driver fail path should not return ERESTARTSYS to user space. The new
patch, I change amdgpu_cs_submit to return -EAGAIN if userptr is
updated, and amdgpu_cs_ioctl redo the ioctl only if error code is
-EAGAIN. ERESTARTSYS error code returns to user space for signal handle
as before.


Good idea, that is probably cleaner. EGAIN and EINTR should have the 
same effect in drmIoctl IIRC.



Below are details of userptr path difference. For the previous path,
libdrm test always goes to step 2, step 3 never trigger. So it never
return -ERESTARTSYS, but in theory, this could happen.

For HMM path, the test always goes to step 3, we have to handle this
case inside amdgpu_cs_ioctl. Maybe I can change amdgpu_cs_submit to
return -EBUSY, then restart the ioctl inside amdgpu_cs_ioctl. I will
submit new patch.

Clearly a NAK, this won't work correctly.


I don't understand your concern, may you explain the reason?


When some underlying code returns ERESTARTSYS we *MUST* forward that to 
the upper layer.


Handling it in a loop or replacing it manually with EINTR or EGAIN will 
break signal handling.


This is suggested quite often, but can end up in an endless loop inside 
the kernel which is quite ugly.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx