Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Robin Murphy

On 12/04/18 10:42, Christian König wrote:

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.
So why not fix said code? It's clearly not a real hardware 
limitation, and
the map_sg() APIs have potentially returned fewer than nents since 
forever,

so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.


I agree that the common DRM functions should be able to deal with both, 
but we should let the driver side decide if it wants multiple addresses 
combined or not.





IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size to 
begin
with, and I think a conscientious DMA API implementation would be at 
rights
to fail the mapping for that reason (I know arm64 happens not to, but 
that

was a deliberate design decision to make my life easier at the time).

Agreed.


Sounds like my understanding of max_seg_size is incorrect, what exactly 
should that describe?


The segment size and boundary mask are there to represent a device's 
hardware scatter-gather capabilities - a lot of things have limits on 
the size and alignment of the data pointed to by a single descriptor 
(e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary) 
- although they can also be relevant to non-scatter-gather devices if 
they just have limits on how big/aligned a single DMA transfer can be.


Robin.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Christian König

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and
the map_sg() APIs have potentially returned fewer than nents since forever,
so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.


I agree that the common DRM functions should be able to deal with both, 
but we should let the driver side decide if it wants multiple addresses 
combined or not.





IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size to begin
with, and I think a conscientious DMA API implementation would be at rights
to fail the mapping for that reason (I know arm64 happens not to, but that
was a deliberate design decision to make my life easier at the time).

Agreed.


Sounds like my understanding of max_seg_size is incorrect, what exactly 
should that describe?


Thanks,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Christoph Hellwig
On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>
> So why not fix said code? It's clearly not a real hardware limitation, and 
> the map_sg() APIs have potentially returned fewer than nents since forever, 
> so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.

>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>
> Disagree; this is a dodgy hack, since you'll now end up passing 
> scatterlists into dma_map_sg() which already violate max_seg_size to begin 
> with, and I think a conscientious DMA API implementation would be at rights 
> to fail the mapping for that reason (I know arm64 happens not to, but that 
> was a deliberate design decision to make my life easier at the time).

Agreed.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-11 Thread Robin Murphy

On 11/04/18 15:33, Sinan Kaya wrote:

On 4/11/2018 8:03 AM, Robin Murphy wrote:

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned
from dma_map_sg() function compared to
sg_alloc_table_from_pages(). This doesn't hold true universally
especially for systems with IOMMU.


So why not fix said code? It's clearly not a real hardware
limitation, and the map_sg() APIs have potentially returned fewer
than nents since forever, so there's really no excuse.


Sure, I'll take a better fix if there is one.




IOMMU driver tries to combine buffers into a single DMA address
as much as it can. The right thing is to tell the DMA layer how
much combining IOMMU can do.


Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size
to begin with, and I think a conscientious DMA API implementation
would be at rights to fail the mapping for that reason (I know
arm64 happens not to, but that was a deliberate design decision to
make my life easier at the time).

As a short-term fix, at least do something like what i915 does and
constrain the table allocation to the desired segment size as well,
so things remain self-consistent. But still never claim that faking
a hardware constraint as a workaround for a driver shortcoming is
"the right thing to do" ;)


You are asking for something like this from here, right?

https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58


I just looked for callers of __sg_alloc_table_from_pages() with a limit 
other than SCATTERLIST_MAX_SEGMENT, and found 
__i915_gem_userptr_alloc_pages(), which at first glance appears to be 
doing something vaguely similar to amdgpu_ttm_tt_pin_userptr().


Robin.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-11 Thread Robin Murphy

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.


So why not fix said code? It's clearly not a real hardware limitation, 
and the map_sg() APIs have potentially returned fewer than nents since 
forever, so there's really no excuse.



IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.


Disagree; this is a dodgy hack, since you'll now end up passing 
scatterlists into dma_map_sg() which already violate max_seg_size to 
begin with, and I think a conscientious DMA API implementation would be 
at rights to fail the mapping for that reason (I know arm64 happens not 
to, but that was a deliberate design decision to make my life easier at 
the time).


As a short-term fix, at least do something like what i915 does and 
constrain the table allocation to the desired segment size as well, so 
things remain self-consistent. But still never claim that faking a 
hardware constraint as a workaround for a driver shortcoming is "the 
right thing to do" ;)


Robin.


Signed-off-by: Sinan Kaya 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
  4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 8e28270..1b031eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
}
-
+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
r = gmc_v6_0_init_microcode(adev);
if (r) {
dev_err(adev->dev, "Failed to load mc firmware!\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 86e9d682..0a4b2cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
  
  	r = gmc_v7_0_init_microcode(adev);

if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 9a813d8..b171529 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
  
  	r = gmc_v8_0_init_microcode(adev);

if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3b7e7af..36e658ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
}
+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
  
  	r = gmc_v9_0_mc_init(adev);

if (r)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-11 Thread Christian König

Am 11.04.2018 um 08:26 schrieb Huang Rui:

On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Signed-off-by: Sinan Kaya 

Sinan, I guess Christian's suggestion is to add amdgpu_device_init function
like here:


Yes, exactly. Looks like Sinan just tried to place the call next to 
pci_set_dma_mask()/pci_set_consistent_dma_mask().


Not necessary a bad idea, but in this case not optimal.

Sinan please refine your patch as Rui suggested and resend.

Thanks,
Christian.



8<---
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e798b3..9b96771 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 /* init the mode config */
 drm_mode_config_init(adev->ddev);

+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
+
 r = amdgpu_device_ip_init(adev);
 if (r) {
 /* failed in exclusive mode due to timeout */
8<---

After that, we don't do it in each generation.

Thanks,
Ray
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-11 Thread Huang Rui
On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:
> Code is expecing to observe the same number of buffers returned from
> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
> doesn't hold true universally especially for systems with IOMMU.
> 
> IOMMU driver tries to combine buffers into a single DMA address as much
> as it can. The right thing is to tell the DMA layer how much combining
> IOMMU can do.
> 
> Signed-off-by: Sinan Kaya 

Sinan, I guess Christian's suggestion is to add amdgpu_device_init function
like here:

8<---
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e798b3..9b96771 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
/* init the mode config */
drm_mode_config_init(adev->ddev);

+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
+
r = amdgpu_device_ip_init(adev);
if (r) {
/* failed in exclusive mode due to timeout */
8<---

After that, we don't do it in each generation.

Thanks,
Ray
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx