[PATCH] drm/amdgpu: improve GTT BO alloc speed in OGL

2016-09-13 Thread Michel Dänzer
On 13/09/16 01:44 AM, Alex Deucher wrote:
> From: "monk.liu" 
> 
> original we use ttm_dma path to allocate GTT bo, which is too much
> slower than the path of ttm_pool, in most cases.
> 
> The swiotlb checks don't seem to work and we always end up in the
> slow path even when an IOMMU is available.

This change will break any cases where SWIOTLB is actually necessary
though, won't it?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


[PATCH] drm/amdgpu: improve GTT BO alloc speed in OGL

2016-09-12 Thread Alex Deucher
On Mon, Sep 12, 2016 at 9:17 PM, Michel Dänzer  wrote:
> On 13/09/16 01:44 AM, Alex Deucher wrote:
>> From: "monk.liu" 
>>
>> original we use ttm_dma path to allocate GTT bo, which is too much
>> slower than the path of ttm_pool, in most cases.
>>
>> The swiotlb checks don't seem to work and we always end up in the
>> slow path even when an IOMMU is available.
>
> This change will break any cases where SWIOTLB is actually necessary
> though, won't it?

Yes, theoretically.

Alex

>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer


[PATCH] drm/amdgpu: improve GTT BO alloc speed in OGL

2016-09-12 Thread Christian König
Am 12.09.2016 um 18:44 schrieb Alex Deucher:
> From: "monk.liu" 
>
> original we use ttm_dma path to allocate GTT bo, which is too much
> slower than the path of ttm_pool, in most cases.
>
> The swiotlb checks don't seem to work and we always end up in the
> slow path even when an IOMMU is available.

While the check is clearly not correct. Simply always using the direct 
mapping and not checking the fallback path can break as well.

So this patch is clearly not a good idea and needs to be fixed before it 
is pushed.

Christian.

> Signed-off-by: monk.liu 
> Reviewed-by: Jammy Zhou 
> Signed-off-by: Alex Deucher 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 -
>   1 file changed, 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 3beb10b..e2fcd39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -783,12 +783,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm)
>   
>   adev = amdgpu_get_adev(ttm->bdev);
>   
> -#ifdef CONFIG_SWIOTLB
> - if (swiotlb_nr_tbl()) {
> - return ttm_dma_populate(>ttm, adev->dev);
> - }
> -#endif
> -
>   r = ttm_pool_populate(ttm);
>   if (r) {
>   return r;
> @@ -829,13 +823,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
>   
>   adev = amdgpu_get_adev(ttm->bdev);
>   
> -#ifdef CONFIG_SWIOTLB
> - if (swiotlb_nr_tbl()) {
> - ttm_dma_unpopulate(>ttm, adev->dev);
> - return;
> - }
> -#endif
> -
>   for (i = 0; i < ttm->num_pages; i++) {
>   if (gtt->ttm.dma_address[i]) {
>   pci_unmap_page(adev->pdev, gtt->ttm.dma_address[i],




[PATCH] drm/amdgpu: improve GTT BO alloc speed in OGL

2016-09-12 Thread Alex Deucher
On Mon, Sep 12, 2016 at 2:26 PM, Christian König
 wrote:
> Am 12.09.2016 um 18:44 schrieb Alex Deucher:
>>
>> From: "monk.liu" 
>>
>> original we use ttm_dma path to allocate GTT bo, which is too much
>> slower than the path of ttm_pool, in most cases.
>>
>> The swiotlb checks don't seem to work and we always end up in the
>> slow path even when an IOMMU is available.
>
>
> While the check is clearly not correct. Simply always using the direct
> mapping and not checking the fallback path can break as well.
>
> So this patch is clearly not a good idea and needs to be fixed before it is
> pushed.

Jerome looked into it when Monk first debugged this, but I don't think
anything ever came of it:
https://patchwork.kernel.org/patch/7079521/

Alex

>
> Christian.
>
>
>> Signed-off-by: monk.liu 
>> Reviewed-by: Jammy Zhou 
>> Signed-off-by: Alex Deucher 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 -
>>   1 file changed, 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 3beb10b..e2fcd39 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -783,12 +783,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm)
>> adev = amdgpu_get_adev(ttm->bdev);
>>   -#ifdef CONFIG_SWIOTLB
>> -   if (swiotlb_nr_tbl()) {
>> -   return ttm_dma_populate(>ttm, adev->dev);
>> -   }
>> -#endif
>> -
>> r = ttm_pool_populate(ttm);
>> if (r) {
>> return r;
>> @@ -829,13 +823,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_tt
>> *ttm)
>> adev = amdgpu_get_adev(ttm->bdev);
>>   -#ifdef CONFIG_SWIOTLB
>> -   if (swiotlb_nr_tbl()) {
>> -   ttm_dma_unpopulate(>ttm, adev->dev);
>> -   return;
>> -   }
>> -#endif
>> -
>> for (i = 0; i < ttm->num_pages; i++) {
>> if (gtt->ttm.dma_address[i]) {
>> pci_unmap_page(adev->pdev,
>> gtt->ttm.dma_address[i],
>
>
>


[PATCH] drm/amdgpu: improve GTT BO alloc speed in OGL

2016-09-12 Thread Alex Deucher
From: "monk.liu" 

original we use ttm_dma path to allocate GTT bo, which is too much
slower than the path of ttm_pool, in most cases.

The swiotlb checks don't seem to work and we always end up in the
slow path even when an IOMMU is available.

Signed-off-by: monk.liu 
Reviewed-by: Jammy Zhou 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 3beb10b..e2fcd39 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -783,12 +783,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm)

adev = amdgpu_get_adev(ttm->bdev);

-#ifdef CONFIG_SWIOTLB
-   if (swiotlb_nr_tbl()) {
-   return ttm_dma_populate(>ttm, adev->dev);
-   }
-#endif
-
r = ttm_pool_populate(ttm);
if (r) {
return r;
@@ -829,13 +823,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)

adev = amdgpu_get_adev(ttm->bdev);

-#ifdef CONFIG_SWIOTLB
-   if (swiotlb_nr_tbl()) {
-   ttm_dma_unpopulate(>ttm, adev->dev);
-   return;
-   }
-#endif
-
for (i = 0; i < ttm->num_pages; i++) {
if (gtt->ttm.dma_address[i]) {
pci_unmap_page(adev->pdev, gtt->ttm.dma_address[i],
-- 
2.5.5