[PATCH] iommu/exynos: Make driver independent of the system page size

2022-06-23 Thread Marek Szyprowski
PAGE_{SIZE,SHIFT} macros depend on the configured kernel's page size, but
the driver expects values calculated as for 4KB pages. Fix this.

Reported-by: Robin Murphy 
Signed-off-by: Marek Szyprowski 
---
Untested, because Exynos based boards I have doesn't boot with non-4KB
page size for other reasons.
---
 drivers/iommu/exynos-iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 71f2018e23fe..9c060505a46e 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -340,8 +340,7 @@ static void __sysmmu_set_ptbase(struct sysmmu_drvdata 
*data, phys_addr_t pgd)
if (MMU_MAJ_VER(data->version) < 5)
writel(pgd, data->sfrbase + REG_PT_BASE_ADDR);
else
-   writel(pgd >> PAGE_SHIFT,
-data->sfrbase + REG_V5_PT_BASE_PFN);
+   writel(pgd / SZ_4K, data->sfrbase + REG_V5_PT_BASE_PFN);
 
__sysmmu_tlb_invalidate(data);
 }
@@ -551,7 +550,7 @@ static void sysmmu_tlb_invalidate_entry(struct 
sysmmu_drvdata *data,
 * 64KB page can be one of 16 consecutive sets.
 */
if (MMU_MAJ_VER(data->version) == 2)
-   num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
+   num_inv = min_t(unsigned int, size / SZ_4K, 64);
 
if (sysmmu_block(data)) {
__sysmmu_tlb_invalidate_entry(data, iova, num_inv);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 2/3] iommu/samsung: Introduce Exynos sysmmu-v8 driver

2022-06-22 Thread Marek Szyprowski

On 22.06.2022 11:14, Robin Murphy wrote:
> On 2022-06-21 20:57, Sam Protsenko wrote:
>> Hi Marek,
>>
>> On Fri, 21 Jan 2022 at 14:31, Marek Szyprowski 
>>  wrote:
>>
>> [snip]
>>
>>>
>>> Well, for starting point the existing exynos-iommu driver is really
>>> enough. I've played a bit with newer Exyos SoCs some time ago. If I
>>> remember right, if you limit the iommu functionality to the essential
>>> things like mapping pages to IO-virtual space, the hardware differences
>>> between SYSMMU v5 (already supported by the exynos-iommu driver) and v7
>>> are just a matter of changing a one register during the initialization
>>> and different bits the page fault reason decoding. You must of course
>>> rely on the DMA-mapping framework and its implementation based on
>>> mainline DMA-IOMMU helpers. All the code for custom iommu group(s)
>>> handling or extended fault management are not needed for the initial
>>> version.
>>>
>>
>> Thanks for the advice! Just implemented some testing driver, which
>> uses "Emulated Translation" registers available on SysMMU v7. That's
>> one way to verify the IOMMU driver with no actual users of it. It
>> works fine with vendor SysMMU driver I ported to mainline earlier, and
>> now I'm trying to use it with exynos-sysmmu driver (existing
>> upstream). If you're curious -- I can share the testing driver
>> somewhere on GitHub.
>>
>> I believe the register you mentioned is PT_BASE one, so I used
>> REG_V7_FLPT_BASE_VM = 0x800C instead of REG_V5_PT_BASE_PFN. But I
>> didn't manage to get that far, unfortunately, as
>> exynos_iommu_domain_alloc() function fails in my case, with BUG_ON()
>> at this line:
>>
>>  /* For mapping page table entries we rely on dma == phys */
>>  BUG_ON(handle != virt_to_phys(domain->pgtable));
>>
>> One possible explanation for this BUG is that "dma-ranges" property is
>> not provided in DTS (which seems to be the case right now for all
>> users of "samsung,exynos-sysmmu" driver). Because of that the SWIOTLB
>> is used for dma_map_single() call (in exynos_iommu_domain_alloc()
>> function), which in turn leads to that BUG. At least that's what
>> happens in my case. The call chain looks like this:
>>
>>  exynos_iommu_domain_alloc()
>>  v
>>  dma_map_single()
>>  v
>>  dma_map_single_attrs()
>>  v
>>  dma_map_page_attrs()
>>  v
>>  dma_direct_map_page()  // dma_capable() == false
>>  v
>>  swiotlb_map()
>>  v
>>  swiotlb_tbl_map_single()
>>
>> And the last call of course always returns the address different than
>> the address for allocated pgtable. E.g. in my case I see this:
>>
>>  handle = 0xfbfff000
>>  virt_to_phys(domain->pgtable) = 0x000880d0c000
>>
>> Do you know what might be the reason for that? I just wonder how the
>> SysMMU driver work for all existing Exynos platforms right now. I feel
>> I might be missing something, like some DMA option should be enabled
>> so that SWIOTLB is not used, or something like that. Please let me
>> know if you have any idea on possible cause. The vendor's SysMMU
>> driver is kinda different in that regard, as it doesn't use
>> dma_map_single(), so I don't see such issue there.
>
> If this SysMMU version is capable of addressing more than 32 bits, 
> then exynos_iommu_probe_device() should set its DMA masks appropriately.

Well, Sam's question was about the Exynos SYSMMU own platform device, 
not the device for which Exynos SYSMMU manages the IO virtual address 
space.

Simply add something like

dma_set_mask(dev, DMA_BIT_MASK(36));

to the beginning of the exynos_sysmmu_probe(). This will disable SWIOTLB 
and switch to DMA-direct for the Exynos SYSMMU platform device.


> (as a side note since I looked, the use of PAGE_SIZE/PAGE_SHIFT in the 
> driver looks wrong, since I can't imagine that the hardware knows 
> whether Linux is using 4KB, 16KB or 64KB and adjusts itself 
> accordingly...)

Right, this has to be cleaned up. This driver was never used on systems 
with kernel configuration for non-4KB pages.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-06-17 Thread Marek Szyprowski
Hi Robin,

On 06.06.2022 14:38, Robin Murphy wrote:
> On 2022-06-02 16:58, Marek Szyprowski wrote:
>> On 23.05.2022 19:30, Robin Murphy wrote:
>>> On 2022-05-11 13:15, Ajay Kumar wrote:
>>>> From: Marek Szyprowski 
>>>>
>>>> Zero is a valid DMA and IOVA address on many architectures, so adjust
>>>> the
>>>> IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
>>>> (~0UL) is introduced as a generic value for the error case. Adjust all
>>>> callers of the alloc_iova_fast() function for the new return value.
>>>
>>> And when does anything actually need this? In fact if you were to stop
>>> iommu-dma from reserving IOVA 0 - which you don't - it would only show
>>> how patch #3 is broken.
>>
>> I don't get why you says that patch #3 is broken.
>
> My mistake, in fact it's already just as broken either way - I forgot 
> that that's done implicitly via iovad->start_pfn rather than an actual 
> reserve_iova() entry. I see there is an initial calculation attempting 
> to honour start_pfn in patch #3, but it's always immediately 
> overwritten. More critically, the rb_first()/rb_next walk means the 
> starting point can only creep inevitably upwards as older allocations 
> are freed, so will end up pathologically stuck at or above limit_pfn 
> and unable to recover. At best it's more "next-fit" than "first-fit", 
> and TBH the fact that it could ever even allocate anything at all in 
> an empty domain makes me want to change the definition of IOVA_ANCHOR ;)
>
>> Them main issue with
>> address 0 being reserved appears when one uses first fit allocation
>> algorithm. In such case the first allocation will be at the 0 address,
>> what causes some issues. This patch simply makes the whole iova related
>> code capable of such case. This mimics the similar code already used on
>> ARM 32bit. There are drivers that rely on the way the IOVAs are
>> allocated. This is especially important for the drivers for devices with
>> limited addressing capabilities. And there exists such and they can be
>> even behind the IOMMU.
>>
>> Lets focus on the s5p-mfc driver and the way one of the supported
>> hardware does the DMA. The hardware has very limited DMA window (256M)
>> and addresses all memory buffers as an offset from the firmware base.
>> Currently it has been assumed that the first allocated buffer will be on
>> the beginning of the IOVA space. This worked fine on ARM 32bit and
>> supporting such case is a target of this patchset.
>
> I still understand the use-case; I object to a solution where PFN 0 is 
> always reserved yet sometimes allocated. Not to mention the slightly 
> more fundamental thing of the whole lot not actually working the way 
> it's supposed to.
>
> I also remain opposed to baking in secret ABIs where allocators are 
> expected to honour a particular undocumented behaviour. FWIW I've had 
> plans for a while now to make the IOVA walk bidirectional to make the 
> existing retry case more efficient, at which point it's then almost 
> trivial to implement "bottom-up" allocation, which I'm thinking I 
> might then force on by default for CONFIG_ARM to minimise any further 
> surprises for v2 of the default domain conversion. However I'm 
> increasingly less convinced that it's really sufficient for your case, 
> and am leaning back towards the opinion that any driver with really 
> specific constraints on how its DMA addresses are laid out should not 
> be passively relying on a generic allocator to do exactly what it 
> wants. A simple flag saying "try to allocate DMA addresses bottom-up" 
> doesn't actually mean "allocate this thing on a 256MB-aligned address 
> and everything subsequent within a 256MB window above it", so let's 
> not build a fragile house of cards on top of pretending that it does.


Okay, I see your point. I'm fine with adding more of the IOVA allocation 
logic to the respective driver (s5p-mfc), however I would still like to 
use the dma-mapping framework and helpers, which depend on it (like 
v4l2-videobuf2, dma-buf, and so on). Would it be fine for you to let 
device drivers to access the IOVA domains hidden by the DMA-IOMMU glue 
code to mark certain parts of IOVA space as reserved or to manually 
remap the buffer (like firmware) with specific address requirements?


Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-06-02 Thread Marek Szyprowski
Hi Robin,

On 23.05.2022 19:30, Robin Murphy wrote:
> On 2022-05-11 13:15, Ajay Kumar wrote:
>> From: Marek Szyprowski 
>>
>> Zero is a valid DMA and IOVA address on many architectures, so adjust 
>> the
>> IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
>> (~0UL) is introduced as a generic value for the error case. Adjust all
>> callers of the alloc_iova_fast() function for the new return value.
>
> And when does anything actually need this? In fact if you were to stop 
> iommu-dma from reserving IOVA 0 - which you don't - it would only show 
> how patch #3 is broken.

I don't get why you says that patch #3 is broken. Them main issue with 
address 0 being reserved appears when one uses first fit allocation 
algorithm. In such case the first allocation will be at the 0 address, 
what causes some issues. This patch simply makes the whole iova related 
code capable of such case. This mimics the similar code already used on 
ARM 32bit. There are drivers that rely on the way the IOVAs are 
allocated. This is especially important for the drivers for devices with 
limited addressing capabilities. And there exists such and they can be 
even behind the IOMMU.

Lets focus on the s5p-mfc driver and the way one of the supported 
hardware does the DMA. The hardware has very limited DMA window (256M) 
and addresses all memory buffers as an offset from the firmware base. 
Currently it has been assumed that the first allocated buffer will be on 
the beginning of the IOVA space. This worked fine on ARM 32bit and 
supporting such case is a target of this patchset.


> Also note that it's really nothing to do with architecture either way; 
> iommu-dma simply chooses to reserve IOVA 0 for its own convenience, 
> mostly because it can. Much the same way that 0 is typically a valid 
> CPU VA, but mapping something meaningful there is just asking for a 
> world of pain debugging NULL-dereference bugs.

Right that it is not directly related to the architecture, but it is 
much more common case for some architectures where DMA (IOVA) address = 
physical address + some offset. The commit message can be rephrased, 
though.

I see no reason to forbid the 0 as a valid IOVA. The DMA-mapping also 
uses DMA_MAPPING_ERROR as ~0. It looks that when last fit allocation 
algorithm is used no one has ever need to consider such case so far.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] iommu: Retire bus_set_iommu()

2022-04-14 Thread Marek Szyprowski
Hi Robin,

On 14.04.2022 14:42, Robin Murphy wrote:
> Hi all,
>
> Here's another chapter in my saga of moving to per-instance IOMMU ops -
> iommu_present() and iommu_capable() cleanups will be ongoing for another
> cycle or two, while this one is at least self-contained within the
> subsystem. The next steps after this are making iommu_domain_alloc()
> instance-aware - which should finish the public API - and pulling the
> fwnode/of_xlate bits into __iommu_probe_device(). And then making sense
> of whatever's left :)
>
> For ease of review here I split out individual driver patches based on
> whether there was any non-trivial change or affect on control flow; the
> straightforward deletions are all lumped together since the whole series
> needs applying together either way, but I'm happy to split the final
> patch up further if anyone would like.
>
> Patch #3 for AMD is based on Mario's SWIOTLB patch here:
>
> https://lore.kernel.org/linux-iommu/20220404204723.9767-1-mario.limoncie...@amd.com/
>
> since that wants merging first as fix material. The series is also based
> contextually (but not functionally) on my device_iommu_capable() patches
> here:
>
> https://lore.kernel.org/linux-iommu/cover.1649089693.git.robin.mur...@arm.com/
>
> since those are pretty much good to go now (I'll send a slightly-tweaked
> final version once the iommu/core branch is open).

Works fine on Samsung Exynos based boards (both, ARM 32bit and 64bit).

Tested-by: Marek Szyprowski  # for Exynos 
relevant changes


>
> Thanks,
> Robin.
>
>
> Robin Murphy (13):
>iommu: Always register bus notifiers
>iommu: Move bus setup to IOMMU device registration
>iommu/amd: Clean up bus_set_iommu()
>iommu/arm-smmu: Clean up bus_set_iommu()
>iommu/arm-smmu-v3: Clean up bus_set_iommu()
>iommu/dart: Clean up bus_set_iommu()
>iommu/exynos: Clean up bus_set_iommu()
>iommu/ipmmu-vmsa: Clean up bus_set_iommu()
>iommu/mtk: Clean up bus_set_iommu()
>iommu/omap: Clean up bus_set_iommu()
>iommu/tegra-smmu: Clean up bus_set_iommu()
>iommu/virtio: Clean up bus_set_iommu()
>iommu: Clean up bus_set_iommu()
>
>   drivers/iommu/amd/amd_iommu.h   |   1 -
>   drivers/iommu/amd/init.c|   9 +-
>   drivers/iommu/amd/iommu.c   |  21 
>   drivers/iommu/apple-dart.c  |  30 +-
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  53 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu.c   |  84 +--
>   drivers/iommu/arm/arm-smmu/qcom_iommu.c |   4 -
>   drivers/iommu/exynos-iommu.c|   9 --
>   drivers/iommu/fsl_pamu_domain.c |   4 -
>   drivers/iommu/intel/iommu.c |   1 -
>   drivers/iommu/iommu.c   | 109 +---
>   drivers/iommu/ipmmu-vmsa.c  |  35 +--
>   drivers/iommu/msm_iommu.c   |   2 -
>   drivers/iommu/mtk_iommu.c   |  13 +--
>   drivers/iommu/mtk_iommu_v1.c|  13 +--
>   drivers/iommu/omap-iommu.c  |   6 --
>   drivers/iommu/rockchip-iommu.c  |   2 -
>   drivers/iommu/s390-iommu.c  |   6 --
>   drivers/iommu/sprd-iommu.c  |   5 -
>   drivers/iommu/sun50i-iommu.c|   2 -
>   drivers/iommu/tegra-smmu.c  |  29 ++
>   drivers/iommu/virtio-iommu.c|  24 -
>   include/linux/iommu.h   |   1 -
>   23 files changed, 62 insertions(+), 401 deletions(-)
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 2/3] iommu/samsung: Introduce Exynos sysmmu-v8 driver

2022-01-21 Thread Marek Szyprowski
5 (already supported by the exynos-iommu driver) and v7 
are just a matter of changing a one register during the initialization 
and different bits the page fault reason decoding. You must of course 
rely on the DMA-mapping framework and its implementation based on 
mainline DMA-IOMMU helpers. All the code for custom iommu group(s) 
handling or extended fault management are not needed for the initial 
version.

The IOMMU driver on its own doesn't really make much sense, so you need 
the other driver/device pair which will make use of it. You have 
mentioned DPU, so you are trying to bring the display stack. Please 
check the existing Exynos DRM driver(s). They nicely use DMA-mapping 
framework and are really modular, so adding hw-specific sub-drivers for 
Exynos850 shouldn't be that hard. Don't expect that the vendor's drivers 
based on custom frameworks will work there though.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/11] dma-direct: add a dma_direct_use_pool helper

2021-12-08 Thread Marek Szyprowski
Hi Christoph,

On 11.11.2021 07:50, Christoph Hellwig wrote:
> Add a helper to check if a potentially blocking operation should
> dip into the atomic pools.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 
> ---
>   kernel/dma/direct.c | 18 --
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 924937c54e8ab..d0a317ed8f029 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -156,6 +156,15 @@ static struct page *__dma_direct_alloc_pages(struct 
> device *dev, size_t size,
>   return page;
>   }
>   
> +/*
> + * Check if a potentially blocking operations needs to dip into the atomic
> + * pools for the given device/gfp.
> + */
> +static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
> +{
> + return gfpflags_allow_blocking(gfp) && !is_swiotlb_for_alloc(dev);
This should be:

return !gfpflags_allow_blocking(gfp) && !is_swiotlb_for_alloc(dev);

otherwise all dma allocations fail badly on ARM64, what happens on today's 
linux-next (plenty of "Failed to get suitable pool for XYZ" messages).

Do you want me to send a fixup patch or would you simply fix it in your tree?

> +}
> +
>   static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
>   dma_addr_t *dma_handle, gfp_t gfp)
>   {
> @@ -235,8 +244,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>*/
>   remap = IS_ENABLED(CONFIG_DMA_DIRECT_REMAP);
>   if (remap) {
> - if (!gfpflags_allow_blocking(gfp) &&
> - !is_swiotlb_for_alloc(dev))
> + if (dma_direct_use_pool(dev, gfp))
>   return dma_direct_alloc_from_pool(dev, size,
>   dma_handle, gfp);
>   } else {
> @@ -250,8 +258,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>* Decrypting memory may block, so allocate the memory from the atomic
>* pools if we can't block.
>*/
> - if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
> - !is_swiotlb_for_alloc(dev))
> + if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
>   return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
>   
>   /* we always manually zero the memory once we are done */
> @@ -360,8 +367,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
> size_t size,
>   struct page *page;
>   void *ret;
>   
> - if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
> - !is_swiotlb_for_alloc(dev))
> + if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
>   return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
>   
>   page = __dma_direct_alloc_pages(dev, size, gfp);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Tidy up Kconfig selects

2021-09-28 Thread Marek Szyprowski
On 28.09.2021 11:30, Joerg Roedel wrote:
> On Mon, Sep 13, 2021 at 01:41:19PM +0100, Robin Murphy wrote:
>> Now that the dust has settled on converting all the x86 drivers to
>> iommu-dma, we can punt the Kconfig selection to arch code where it
>> was always intended to be.
> Can we select IOMMU_DMA under IOMMU_SUPPORT instead? The only drivers
> not using IOMMU_DMA are the arm32 ones, afaics.
>
> If we could get rid of the arm32 exception, the IOMMU_DMA symbol could
> also go away entirely and we handle it under IOMMU_SUPPORT instead. But
> that is something for the future :)

Maybe it would be a good motivation to get back to 
https://lore.kernel.org/linux-iommu/cover.1597931875.git.robin.mur...@arm.com/ 
:)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-09-01 Thread Marek Szyprowski
On 21.05.2021 05:03, Wang Xingang wrote:
> From: Xingang Wang 
>
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
>
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 

This patch landed in linux-next as commit 57a4ab1584e6 ("iommu/of: Fix 
pci_request_acs() before enumerating PCI devices"). Sadly it breaks PCI 
operation on ARM Juno R1 board (arch/arm64/boot/dts/arm/juno-r1.dts). It 
looks that the IOMMU controller is not probed for some reasons:

# cat /sys/kernel/debug/devices_deferred
2b60.iommu

Reverting this patch on top of current linux-next fixes this issue. If 
you need more information to debug this issue, just let me know.

> ---
>   drivers/iommu/of_iommu.c | 1 -
>   drivers/pci/of.c | 8 +++-
>   2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>   
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>   
>   int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>   {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>   
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 05/25] iommu/exynos: Drop IOVA cookie management

2021-08-05 Thread Marek Szyprowski


On 04.08.2021 19:15, Robin Murphy wrote:
> The core code bakes its own cookies now.
>
> CC: Marek Szyprowski 
> Signed-off-by: Robin Murphy 

Acked-by: Marek Szyprowski 

Tested-by: Marek Szyprowski 

> ---
>
> v3: Also remove unneeded include
> ---
>   drivers/iommu/exynos-iommu.c | 19 ---
>   1 file changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index d0fbf1d10e18..939ffa768986 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -21,7 +21,6 @@
>   #include 
>   #include 
>   #include 
> -#include 
>   
>   typedef u32 sysmmu_iova_t;
>   typedef u32 sysmmu_pte_t;
> @@ -735,20 +734,16 @@ static struct iommu_domain 
> *exynos_iommu_domain_alloc(unsigned type)
>   /* Check if correct PTE offsets are initialized */
>   BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev);
>   
> + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
> + return NULL;
> +
>   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
>   if (!domain)
>   return NULL;
>   
> - if (type == IOMMU_DOMAIN_DMA) {
> - if (iommu_get_dma_cookie(>domain) != 0)
> - goto err_pgtable;
> - } else if (type != IOMMU_DOMAIN_UNMANAGED) {
> - goto err_pgtable;
> - }
> -
>   domain->pgtable = (sysmmu_pte_t *)__get_free_pages(GFP_KERNEL, 2);
>   if (!domain->pgtable)
> - goto err_dma_cookie;
> + goto err_pgtable;
>   
>   domain->lv2entcnt = (short *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 
> 1);
>   if (!domain->lv2entcnt)
> @@ -779,9 +774,6 @@ static struct iommu_domain 
> *exynos_iommu_domain_alloc(unsigned type)
>   free_pages((unsigned long)domain->lv2entcnt, 1);
>   err_counter:
>   free_pages((unsigned long)domain->pgtable, 2);
> -err_dma_cookie:
> - if (type == IOMMU_DOMAIN_DMA)
> - iommu_put_dma_cookie(>domain);
>   err_pgtable:
>   kfree(domain);
>   return NULL;
> @@ -809,9 +801,6 @@ static void exynos_iommu_domain_free(struct iommu_domain 
> *iommu_domain)
>   
>   spin_unlock_irqrestore(>lock, flags);
>   
> - if (iommu_domain->type == IOMMU_DOMAIN_DMA)
> - iommu_put_dma_cookie(iommu_domain);
> -
>   dma_unmap_single(dma_dev, virt_to_phys(domain->pgtable), LV1TABLE_SIZE,
>DMA_TO_DEVICE);
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/25] iommu: Pull IOVA cookie management into the core

2021-08-05 Thread Marek Szyprowski


On 04.08.2021 19:15, Robin Murphy wrote:
> Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
> support, we can abandon the notion of drivers being responsible for the
> cookie type, and consolidate all the management into the core code.
>
> CC: Marek Szyprowski 
> CC: Yoshihiro Shimoda 
> CC: Geert Uytterhoeven 
> CC: Yong Wu 
> CC: Heiko Stuebner 
> CC: Chunyan Zhang 
> CC: Maxime Ripard 
> Reviewed-by: Jean-Philippe Brucker 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Robin Murphy 
Tested-by: Marek Szyprowski 
>
> ---
>
> v3: Use a simpler temporary check instead of trying to be clever with
>  the error code
> ---
>   drivers/iommu/iommu.c | 7 +++
>   include/linux/iommu.h | 3 ++-
>   2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2cda9950bd5..b65fcc66ffa4 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -7,6 +7,7 @@
>   #define pr_fmt(fmt)"iommu: " fmt
>   
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -1946,6 +1947,11 @@ static struct iommu_domain 
> *__iommu_domain_alloc(struct bus_type *bus,
>   /* Assume all sizes by default; the driver may override this later */
>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>   
> + /* Temporarily avoid -EEXIST while drivers still get their own cookies 
> */
> + if (type == IOMMU_DOMAIN_DMA && !domain->iova_cookie && 
> iommu_get_dma_cookie(domain)) {
> + iommu_domain_free(domain);
> + domain = NULL;
> + }
>   return domain;
>   }
>   
> @@ -1957,6 +1963,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>   
>   void iommu_domain_free(struct iommu_domain *domain)
>   {
> + iommu_put_dma_cookie(domain);
>   domain->ops->domain_free(domain);
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_free);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 4997c78e2670..141779d76035 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -40,6 +40,7 @@ struct iommu_domain;
>   struct notifier_block;
>   struct iommu_sva;
>   struct iommu_fault_event;
> +struct iommu_dma_cookie;
>   
>   /* iommu fault flags */
>   #define IOMMU_FAULT_READ0x0
> @@ -86,7 +87,7 @@ struct iommu_domain {
>   iommu_fault_handler_t handler;
>   void *handler_token;
>   struct iommu_domain_geometry geometry;
> - void *iova_cookie;
> + struct iommu_dma_cookie *iova_cookie;
>   };
>   
>   enum iommu_cap {

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"

2021-07-06 Thread Marek Szyprowski
On 05.07.2021 16:30, Amey Narkhede wrote:
> On 21/07/05 08:56AM, Marek Szyprowski wrote:
>> QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
>> what fails for the second and latter IOMMU devices. This is intended and
>> must be not fatal to the driver registration process. Also the cleanup
>> path should take care of the runtime PM state, what is missing in the
>> current patch. Revert relevant changes to the QCOM IOMMU driver until
>> a proper fix is prepared.
>>
> Apologies for the broken patch I don't have any arm machine to test the
> patches. Is this bug unique to qcom iommu?

Frankly, I have no idea. Just grep for bus_set_iommu() and check if the 
caller might be executed more than once.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"

2021-07-05 Thread Marek Szyprowski
QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
what fails for the second and latter IOMMU devices. This is intended and
must be not fatal to the driver registration process. Also the cleanup
path should take care of the runtime PM state, what is missing in the
current patch. Revert relevant changes to the QCOM IOMMU driver until
a proper fix is prepared.

This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155.

Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path")
Suggested-by: Will Deacon 
Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 25ed444ff94d..021cf8f65ffc 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -849,12 +849,10 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev);
if (ret) {
dev_err(dev, "Failed to register iommu\n");
-   goto err_sysfs_remove;
+   return ret;
}
 
-   ret = bus_set_iommu(_bus_type, _iommu_ops);
-   if (ret)
-   goto err_unregister_device;
+   bus_set_iommu(_bus_type, _iommu_ops);
 
if (qcom_iommu->local_base) {
pm_runtime_get_sync(dev);
@@ -863,13 +861,6 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
}
 
return 0;
-
-err_unregister_device:
-   iommu_device_unregister(_iommu->iommu);
-
-err_sysfs_remove:
-   iommu_device_sysfs_remove(_iommu->iommu);
-   return ret;
 }
 
 static int qcom_iommu_device_remove(struct platform_device *pdev)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Marek Szyprowski
On 01.07.2021 11:11, Robin Murphy wrote:
> On 2021-07-01 10:01, Will Deacon wrote:
>> On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:
>>> Hi Robin,
>>>
>>> On 30.06.2021 16:01, Robin Murphy wrote:
>>>> On 2021-06-30 14:48, Marek Szyprowski wrote:
>>>>> On 30.06.2021 14:59, Will Deacon wrote:
>>>>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>>>>>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>>>>>>> If device registration fails, remove sysfs attribute
>>>>>>>> and if setting bus callbacks fails, unregister the device
>>>>>>>> and cleanup the sysfs attribute.
>>>>>>>>
>>>>>>>> Signed-off-by: Amey Narkhede 
>>>>>>> This patch landed in linux-next some time ago as commit 
>>>>>>> 249c9dc6aa0d
>>>>>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>>>>>> bisecting and some manual searching I finally found that it is
>>>>>>> responsible for breaking s2idle on DragonBoard 410c. Here is the 
>>>>>>> log
>>>>>>> (captured with no_console_suspend):
>>>>>>>
>>>>>>> # time rtcwake -s10 -mmem
>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan� 1 
>>>>>>> 00:02:13 1970
>>>>>>> PM: suspend entry (s2idle)
>>>>>>> Filesystems sync: 0.002 seconds
>>>>>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>>>>>> OOM killer disabled.
>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) 
>>>>>>> done.
>>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>>> 0070
>>>>>>> Mem abort info:
>>>>>>> �� � ESR = 0x9606
>>>>>>> �� � EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>> �� � SET = 0, FnV = 0
>>>>>>> �� � EA = 0, S1PTW = 0
>>>>>>> �� � FSC = 0x06: level 2 translation fault
>>>>>>> Data abort info:
>>>>>>> �� � ISV = 0, ISS = 0x0006
>>>>>>> �� � CM = 0, WnR = 0
>>>>>>> user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
>>>>>>> [0070] pgd=080085c3c003, p4d=080085c3c003,
>>>>>>> pud=080088dcf003, pmd=
>>>>>>> Internal error: Oops: 9606 [#1] PREEMPT SMP
>>>>>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>>>>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>>>>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>>>>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>>>>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>>>>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu 
>>>>>>> v4l2_mem2mem
>>>>>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>>>>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
>>>>>>> qnoc_msm8916
>>>>>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector 
>>>>>>> rmtfs_mem
>>>>>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 
>>>>>>> #3592
>>>>>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>>>>>> pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>>>>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>>>>>> ...
>>>>>>> Call trace:
>>>>>>> �� �msm_runtime_suspend+0x1c/0x60 [msm]
>>>>>>> �� �msm_pm_suspend+0x18/0x38 [msm]
>>>>>>> �� �dpm_run_callback+0x84/0x378
>>>>>> I wonder if we're missing a pm_runtime_disable() call on the failure
>>>>>> path?
>>>>>> i.e. something like the diff below...
>>>>>
>>>>> I've checked and it doesn't fix anything.
>>>>
>>>> Wh

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Marek Szyprowski
Hi Robin,

On 30.06.2021 16:01, Robin Murphy wrote:
> On 2021-06-30 14:48, Marek Szyprowski wrote:
>> On 30.06.2021 14:59, Will Deacon wrote:
>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>>>> If device registration fails, remove sysfs attribute
>>>>> and if setting bus callbacks fails, unregister the device
>>>>> and cleanup the sysfs attribute.
>>>>>
>>>>> Signed-off-by: Amey Narkhede 
>>>> This patch landed in linux-next some time ago as commit 249c9dc6aa0d
>>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>>> bisecting and some manual searching I finally found that it is
>>>> responsible for breaking s2idle on DragonBoard 410c. Here is the log
>>>> (captured with no_console_suspend):
>>>>
>>>> # time rtcwake -s10 -mmem
>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
>>>> PM: suspend entry (s2idle)
>>>> Filesystems sync: 0.002 seconds
>>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>>> OOM killer disabled.
>>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 0070
>>>> Mem abort info:
>>>>      ESR = 0x9606
>>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>>      SET = 0, FnV = 0
>>>>      EA = 0, S1PTW = 0
>>>>      FSC = 0x06: level 2 translation fault
>>>> Data abort info:
>>>>      ISV = 0, ISS = 0x0006
>>>>      CM = 0, WnR = 0
>>>> user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
>>>> [0070] pgd=080085c3c003, p4d=080085c3c003,
>>>> pud=080088dcf003, pmd=
>>>> Internal error: Oops: 9606 [#1] PREEMPT SMP
>>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
>>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci 
>>>> qnoc_msm8916
>>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
>>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
>>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>>> pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>>> ...
>>>> Call trace:
>>>>     msm_runtime_suspend+0x1c/0x60 [msm]
>>>>     msm_pm_suspend+0x18/0x38 [msm]
>>>>     dpm_run_callback+0x84/0x378
>>> I wonder if we're missing a pm_runtime_disable() call on the failure 
>>> path?
>>> i.e. something like the diff below...
>>
>> I've checked and it doesn't fix anything.
>
> What's happened previously? Has an IOMMU actually failed to probe, or 
> is this a fiddly "code movement unveils latent bug elsewhere" kind of 
> thing? There doesn't look to be much capable of going wrong in 
> msm_runtime_suspend() itself, so is the DRM driver also in a broken 
> half-probed state where it's left its pm_runtime_ops behind without 
> its drvdata being valid?
>
I finally had some time to analyze this issue. It turned out that with 
this patch, iommu fails to probe for soc:iommu@1f08000 device, while it 
worked fine before. This happens because this patch adds a check for the 
return value of the bus_set_iommu() in 
drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it 
probes successfully again. It looks that there are already iommu ops 
registered for platform bus, before qcom_iommu probes. On the other 
hand, if I remember correctly they are not used during the device 
registration, but they are needed for some legacy stuff. I can send a 
patch restoring old code flow if you think that this is a right solution.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-06-30 Thread Marek Szyprowski
On 30.06.2021 14:59, Will Deacon wrote:
> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>> If device registration fails, remove sysfs attribute
>>> and if setting bus callbacks fails, unregister the device
>>> and cleanup the sysfs attribute.
>>>
>>> Signed-off-by: Amey Narkhede 
>> This patch landed in linux-next some time ago as commit 249c9dc6aa0d
>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>> bisecting and some manual searching I finally found that it is
>> responsible for breaking s2idle on DragonBoard 410c. Here is the log
>> (captured with no_console_suspend):
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
>> PM: suspend entry (s2idle)
>> Filesystems sync: 0.002 seconds
>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0070
>> Mem abort info:
>>     ESR = 0x9606
>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>     SET = 0, FnV = 0
>>     EA = 0, S1PTW = 0
>>     FSC = 0x06: level 2 translation fault
>> Data abort info:
>>     ISV = 0, ISS = 0x0006
>>     CM = 0, WnR = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
>> [0070] pgd=080085c3c003, p4d=080085c3c003,
>> pud=080088dcf003, pmd=
>> Internal error: Oops: 9606 [#1] PREEMPT SMP
>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916
>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>> pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>> lr : msm_pm_suspend+0x18/0x38 [msm]
>> ...
>> Call trace:
>>    msm_runtime_suspend+0x1c/0x60 [msm]
>>    msm_pm_suspend+0x18/0x38 [msm]
>>    dpm_run_callback+0x84/0x378
> I wonder if we're missing a pm_runtime_disable() call on the failure path?
> i.e. something like the diff below...

I've checked and it doesn't fix anything.

> Will
>
> --->8
>
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
> b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 25ed444ff94d..ce8f354755d0 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -836,14 +836,14 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>  ret = devm_of_platform_populate(dev);
>  if (ret) {
>  dev_err(dev, "Failed to populate iommu contexts\n");
> -   return ret;
> +   goto err_pm_disable;
>  }
>   
>  ret = iommu_device_sysfs_add(_iommu->iommu, dev, NULL,
>   dev_name(dev));
>  if (ret) {
>  dev_err(dev, "Failed to register iommu in sysfs\n");
> -   return ret;
> +   goto err_pm_disable;
>  }
>   
>  ret = iommu_device_register(_iommu->iommu, _iommu_ops, 
> dev);
> @@ -869,6 +869,9 @@ static int qcom_iommu_device_probe(struct platform_device 
> *pdev)
>   
>   err_sysfs_remove:
>  iommu_device_sysfs_remove(_iommu->iommu);
> +
> +err_pm_disable:
> +   pm_runtime_disable(dev);
>  return ret;
>   }
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-06-30 Thread Marek Szyprowski
 err;
> + goto err_sysfs_remove;
>   }
>
>   platform_set_drvdata(pdev, smmu);
> @@ -2187,10 +2187,19 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>* any device which might need it, so we want the bus ops in place
>* ready to handle default domain setup as soon as any SMMU exists.
>*/
> - if (!using_legacy_binding)
> - return arm_smmu_bus_init(_smmu_ops);
> + if (!using_legacy_binding) {
> + err = arm_smmu_bus_init(_smmu_ops);
> + if (err)
> + goto err_unregister_device;
> + }
>
>   return 0;
> +
> +err_unregister_device:
> + iommu_device_unregister(>iommu);
> +err_sysfs_remove:
> + iommu_device_sysfs_remove(>iommu);
> + return err;
>   }
>
>   static int arm_smmu_device_remove(struct platform_device *pdev)
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
> b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 4294abe389b2..b785d9fb7602 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -850,10 +850,12 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev);
>   if (ret) {
>   dev_err(dev, "Failed to register iommu\n");
> - return ret;
> + goto err_sysfs_remove;
>   }
>
> - bus_set_iommu(_bus_type, _iommu_ops);
> + ret = bus_set_iommu(_bus_type, _iommu_ops);
> + if (ret)
> + goto err_unregister_device;
>
>   if (qcom_iommu->local_base) {
>   pm_runtime_get_sync(dev);
> @@ -862,6 +864,13 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   }
>
>   return 0;
> +
> +err_unregister_device:
> + iommu_device_unregister(_iommu->iommu);
> +
> +err_sysfs_remove:
> + iommu_device_sysfs_remove(_iommu->iommu);
> + return ret;
>   }
>
>   static int qcom_iommu_device_remove(struct platform_device *pdev)
> --
> 2.31.1
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents

2021-06-30 Thread Marek Szyprowski
_size;
> + while (table->total_nents) {
> + unsigned int alloc_size = table->total_nents;
>   
>   /*
>* If we have more than max_ents segments left,
>* then assign 'next' to the sg table after the current one.
> -  * sg_size is then one less than alloc size, since the last
> -  * element is the chain pointer.
>*/
>   if (alloc_size > curr_max_ents) {
>   next = sg_chain_ptr([curr_max_ents - 1]);
>   alloc_size = curr_max_ents;
> - sg_size = alloc_size - 1;
> - } else {
> - sg_size = alloc_size;
> - next = NULL;
>   }
>   
> - table->orig_nents -= sg_size;
> + table->total_nents -= alloc_size;
>   if (nents_first_chunk)
>   nents_first_chunk = 0;
>   else
> @@ -301,20 +294,11 @@ int __sg_alloc_table(struct sg_table *table, unsigned 
> int nents,
>   } else {
>   sg = alloc_fn(alloc_size, gfp_mask);
>   }
> - if (unlikely(!sg)) {
> - /*
> -  * Adjust entry count to reflect that the last
> -  * entry of the previous table won't be used for
> -  * linkage.  Without this, sg_kfree() may get
> -  * confused.
> -  */
> - if (prv)
> - table->nents = ++table->orig_nents;
> -
> + if (unlikely(!sg))
>   return -ENOMEM;
> - }
>   
>   sg_init_table(sg, alloc_size);
> + table->total_nents += alloc_size;
>   table->nents = table->orig_nents += sg_size;
>   
>   /*
> @@ -385,12 +369,11 @@ static struct scatterlist *get_next_sg(struct sg_table 
> *table,
>   if (!new_sg)
>   return ERR_PTR(-ENOMEM);
>   sg_init_table(new_sg, alloc_size);
> + table->total_nents += alloc_size;
>   if (cur) {
>   __sg_chain(next_sg, new_sg);
> - table->orig_nents += alloc_size - 1;
>   } else {
>       table->sgl = new_sg;
> - table->orig_nents = alloc_size;
>   table->nents = 0;
>   }
>   return new_sg;
> @@ -515,6 +498,7 @@ struct scatterlist *__sg_alloc_table_from_pages(struct 
> sg_table *sgt,
>   cur_page = j;
>   }
>   sgt->nents += added_nents;
> + sgt->orig_nents = sgt->nents;
>   out:
>   if (!left_pages)
>   sg_mark_end(s);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Marek Szyprowski
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 6f72c4d208ca..d20ce4d57df2 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1450,6 +1450,18 @@ static void arm_smmu_release_device(struct device *dev)
>   iommu_fwspec_free(dev);
>   }
>   
> +static void arm_smmu_probe_finalize(struct device *dev)
> +{
> + struct arm_smmu_master_cfg *cfg;
> + struct arm_smmu_device *smmu;
> +
> + cfg = dev_iommu_priv_get(dev);
> + smmu = cfg->smmu;
> +
> + if (smmu->impl->probe_finalize)
> + smmu->impl->probe_finalize(smmu, dev);
> +}
> +
>   static struct iommu_group *arm_smmu_device_group(struct device *dev)
>   {
>   struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
> @@ -1569,6 +1581,7 @@ static struct iommu_ops arm_smmu_ops = {
>   .iova_to_phys   = arm_smmu_iova_to_phys,
>   .probe_device   = arm_smmu_probe_device,
>   .release_device = arm_smmu_release_device,
> + .probe_finalize = arm_smmu_probe_finalize,
>   .device_group   = arm_smmu_device_group,
>   .enable_nesting = arm_smmu_enable_nesting,
>   .set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index c31a59d35c64..147c95e7c59c 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -439,6 +439,7 @@ struct arm_smmu_impl {
> struct device *dev, int start);
>   void (*write_s2cr)(struct arm_smmu_device *smmu, int idx);
>   void (*write_sctlr)(struct arm_smmu_device *smmu, int idx, u32 reg);
> + void (*probe_finalize)(struct arm_smmu_device *smmu, struct device 
> *dev);
>   };
>   
>   #define INVALID_SMENDX  -1

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] dma: coherent: check no-map property for arm64

2021-06-15 Thread Marek Szyprowski
Hi Christoph,

On 14.06.2021 17:36, Christoph Hellwig wrote:
> On Mon, Jun 14, 2021 at 04:34:05PM +0100, Robin Murphy wrote:
>>> Looking at the rmem_dma_device_init() -> dma_init_coherent_memory(), it
>>> ends up calling memremap(MEMREMAP_WC) which would warn if it intersects
>>> with system RAM regardless of the architecture. If the memory region is
>>> nomap, it doesn't end up as IORESOURCE_SYSTEM_RAM, so memremap() won't
>>> warn. But why is this specific only to arm (or arm64)?
>> Didn't some ARMv7 implementations permit unexpected cache hits for the
>> non-cacheable address if the same PA has been speculatively fetched via the
>> cacheable alias?
> If we care about that we need to change these platforms to change the
> cache attributes of the kernel direct mapping instead of using vmap.
> We already have code to do that for openrisc, someone just needs to
> write the glue code for other platforms.

ARAIR there is a problem with changing cache attributes of the direct 
mappings on ARM 32bit. The whole lowmem is mapped with 2M 'section' 
mappings. Changing cache attributes causes 2 issues. First - one would 
need to split 2M entry into 4K entries. Second - 2M section mappings for 
the whole lowmem area are located in the per-process page tables. 
Changing the cache attributes would require locking all processes and 
iterating over their page table entries, which is a huge task.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: exynos: remove unneeded local variable initialization

2021-04-09 Thread Marek Szyprowski
On 08.04.2021 22:16, Krzysztof Kozlowski wrote:
> The initialization of 'fault_addr' local variable is not needed as it is
> shortly after overwritten.
>
> Addresses-Coverity: Unused value
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Marek Szyprowski 

> ---
>   drivers/iommu/exynos-iommu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index de324b4eedfe..8fa9a591fb96 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -407,7 +407,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
> *dev_id)
>   struct sysmmu_drvdata *data = dev_id;
>   const struct sysmmu_fault_info *finfo;
>   unsigned int i, n, itype;
> - sysmmu_iova_t fault_addr = -1;
> + sysmmu_iova_t fault_addr;
>   unsigned short reg_status, reg_clear;
>   int ret = -ENOSYS;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2020-11-25 Thread Marek Szyprowski
gt, uvc_urb->pages,
> + PAGE_ALIGN(stream->urb_size) >> PAGE_SHIFT, 0,
> + stream->urb_size, GFP_KERNEL)) {
> + vunmap(uvc_urb->buffer);
> + dma_free_noncontiguous(dma_dev, stream->urb_size,
> +uvc_urb->pages, uvc_urb->dma);
> + return false;
> + }
> +
> + return true;
> +}
> +#else
> +static bool uvc_alloc_urb_buffer(struct uvc_streaming *stream,
> +  struct uvc_urb *uvc_urb, gfp_t gfp_flags)
> +{
> + uvc_urb->buffer = kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> +
> + return uvc_urb->buffer != NULL;
> +}
> +#endif
> +
>   /*
>* Allocate transfer buffers. This function can be called with buffers
>* already allocated when resuming from suspend, in which case it will
> @@ -1607,19 +1674,11 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming 
> *stream,
>   
>   /* Retry allocations until one succeed. */
>   for (; npackets > 1; npackets /= 2) {
> + stream->urb_size = psize * npackets;
>   for (i = 0; i < UVC_URBS; ++i) {
>   struct uvc_urb *uvc_urb = >uvc_urb[i];
>   
> - stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> - uvc_urb->buffer = usb_alloc_coherent(
> - stream->dev->udev, stream->urb_size,
> - gfp_flags | __GFP_NOWARN, _urb->dma);
> -#else
> - uvc_urb->buffer =
> - kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
> - if (!uvc_urb->buffer) {
> + if (!uvc_alloc_urb_buffer(stream, uvc_urb, gfp_flags)) {
>   uvc_free_urb_buffers(stream);
>   break;
>   }
> diff --git a/drivers/media/usb/uvc/uvcvideo.h 
> b/drivers/media/usb/uvc/uvcvideo.h
> index a3dfacf069c4..3e6618a2ac82 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -532,6 +532,8 @@ struct uvc_urb {
>   
>   char *buffer;
>   dma_addr_t dma;
> + struct page **pages;
> + struct sg_table sgt;
>   
>   unsigned int async_operations;
>   struct uvc_copy_op copy_operations[UVC_MAX_PACKETS];

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Clarify .of_xlate assumptions

2020-10-30 Thread Marek Szyprowski
Hi Robin,

On 29.10.2020 16:34, Robin Murphy wrote:
> A common idiom for .of_xlate is to use of_find_device_by_node() to
> retrieve the relevant IOMMU instance data when translating IOMMU
> specifiers for a client device. Although it's slightly roundabout,
> this is simply looking up something we know exists - if it *were*
> to return NULL, that would mean that no platform device is associated
> with the given DT node, therefore the driver couldn't have probed
> and called iommu_device_register() with the ops that .of_xlate was
> called from in the first place. By construction, we can also assume
> that the instance data for any registered IOMMU must be valid.
>
> This isn't necessarily obvious at first glance, though, so add some
> comments to document these assumptions in-place.
>
> Suggested-by: Yu Kuai 
> Signed-off-by: Robin Murphy 
> ---
>   drivers/iommu/arm/arm-smmu/qcom_iommu.c |  7 ---
>   drivers/iommu/exynos-iommu.c    | 15 ++-
Acked-by: Marek Szyprowski 
>   drivers/iommu/ipmmu-vmsa.c  | 13 ++---
>   drivers/iommu/mtk_iommu.c   |  8 
>   drivers/iommu/rockchip-iommu.c  |  5 -
>   drivers/iommu/sun50i-iommu.c|  5 -
>   6 files changed, 28 insertions(+), 25 deletions(-)
>
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Linaro-mm-sig] [PATCH v5 05/38] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-30 Thread Marek Szyprowski
Hi All,

On 25.09.2020 23:23, Alex Deucher wrote:
> On Tue, Sep 22, 2020 at 2:28 AM Marek Szyprowski
>  wrote:
>> On 22.09.2020 01:15, Alex Goins wrote:
>>> Tested-by: Alex Goins 
>>>
>>> This change fixes a regression with drm_prime_sg_to_page_addr_arrays() and
>>> AMDGPU in v5.9.
>> Thanks for testing!
>>
>>> Commit 39913934 similarly revamped AMDGPU to use sgtable helper functions. 
>>> When
>>> it changed from dma_map_sg_attrs() to dma_map_sgtable(), as a side effect it
>>> started correctly updating sgt->nents to the return value of 
>>> dma_map_sg_attrs().
>>> However, drm_prime_sg_to_page_addr_arrays() incorrectly uses sgt->nents to
>>> iterate over pages, rather than sgt->orig_nents, resulting in it now 
>>> returning
>>> the incorrect number of pages on AMDGPU.
>>>
>>> I had written a patch that changes drm_prime_sg_to_page_addr_arrays() to use
>>> for_each_sgtable_sg() instead of for_each_sg(), iterating using 
>>> sgt->orig_nents:
>>>
>>> -   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
>>> +   for_each_sgtable_sg(sgt, sg, count) {
>>>
>>> This patch takes it further, but still has the effect of fixing the number 
>>> of
>>> pages that drm_prime_sg_to_page_addr_arrays() returns. Something like this
>>> should be included in v5.9 to prevent a regression with AMDGPU.
>> Probably the easiest way to handle a fix for v5.9 would be to simply
>> merge the latest version of this patch also to v5.9-rcX:
>> https://lore.kernel.org/dri-devel/20200904131711.12950-3-m.szyprow...@samsung.com/
>>
>>
>> This way we would get it fixed and avoid possible conflict in the -next.
>> Do you have any AMDGPU fixes for v5.9 in the queue? Maybe you can add
>> that patch to the queue? Dave: would it be okay that way?
> I think this should go into drm-misc for 5.9 since it's an update to
> drm_prime.c.  Is that patch ready to merge?
> Acked-by: Alex Deucher 

Maarten, Maxime or Thomas: could you take this one:

https://lore.kernel.org/dri-devel/20200904131711.12950-3-m.szyprow...@samsung.com/

also to drm-misc-fixes for v5.9-rc?

Best regards

-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/18] ARM/dma-mapping: Switch to iommu_dma_ops

2020-09-28 Thread Marek Szyprowski
Hi Robin,

On 20.08.2020 17:08, Robin Murphy wrote:
> With the IOMMU ops now looking much the same shape as iommu_dma_ops,
> switch them out in favour of the iommu-dma library, currently enhanced
> with temporary workarounds that allow it to also sit underneath the
> arch-specific API. With that in place, we can now start converting the
> remaining IOMMU drivers and consumers to work with IOMMU API default
> domains instead.
>
> Signed-off-by: Robin Murphy 

I've played a bit longer with this and found that reading the kernel 
virtual address of the buffers allocated via dma_alloc_attrs() from 
dma-iommu ops gives trashes from time to time. It took me a while to 
debug this...

Your conversion misses adding arch_dma_prep_coherent() to arch/arm, so 
the buffers are cleared by the mm allocator, but the caches are NOT 
flushed for the newly allocated buffers.

This fixes the issue:

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fec3e59215b8..8b60bcc5b14f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2,6 +2,7 @@
  config ARM
     bool
     default y
+   select ARCH_HAS_DMA_PREP_COHERENT
     select ARCH_32BIT_OFF_T
     select ARCH_HAS_BINFMT_FLAT
     select ARCH_HAS_DEBUG_VIRTUAL if MMU
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ff6c4962161a..6954681b73da 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -266,6 +266,20 @@ static void __dma_clear_buffer(struct page *page, 
size_t size, int coherent_flag
     }
  }

+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+
+   if (PageHighMem(page)) {
+   phys_addr_t base = __pfn_to_phys(page_to_pfn(page));
+   phys_addr_t end = base + size;
+   outer_flush_range(base, end);
+   } else {
+   void *ptr = page_address(page);
+   dmac_flush_range(ptr, ptr + size);
+   outer_flush_range(__pa(ptr), __pa(ptr) + size);
+   }
+}
+
  /*
   * Allocate a DMA buffer for 'dev' of size 'size' using the
   * specified gfp mask.  Note that 'size' must be page aligned.

I also wonder if it would be better to use per-arch __dma_clear_buffer() 
instead of setting __GFP_ZERO unconditionally in dma-iommu.c. This 
should be faster on ARM with highmem...

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-28 Thread Marek Szyprowski
Hi Robin,

On 24.09.2020 13:06, Robin Murphy wrote:
> On 2020-09-24 11:47, Marek Szyprowski wrote:
>> On 24.09.2020 12:40, Robin Murphy wrote:
>>> On 2020-09-24 11:16, Thierry Reding wrote:
>>>> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>>>>> On 24.09.2020 10:28, Joerg Roedel wrote:
>>>>>> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>>>>>>> It allows to remap given buffer at the specific IOVA address,
>>>>>>> although
>>>>>>> it doesn't guarantee that those specific addresses won't be later
>>>>>>> used
>>>>>>> by the IOVA allocator. Probably it would make sense to add an 
>>>>>>> API for
>>>>>>> generic IOMMU-DMA framework to mark the given IOVA range as
>>>>>>> reserved/unused to protect them.
>>>>>> There is an API for that, the IOMMU driver can return IOVA reserved
>>>>>> regions per device and the IOMMU core code will take care of mapping
>>>>>> these regions and reserving them in the IOVA allocator, so that
>>>>>> DMA-IOMMU code will not use it for allocations.
>>>>>>
>>>>>> Have a look at the iommu_ops->get_resv_regions() and
>>>>>> iommu_ops->put_resv_regions().
>>>>>
>>>>> I know about the reserved regions IOMMU API, but the main problem 
>>>>> here,
>>>>> in case of Exynos, is that those reserved regions won't be created by
>>>>> the IOMMU driver but by the IOMMU client device. It is just a result
>>>>> how
>>>>> the media drivers manages their IOVA space. They simply have to load
>>>>> firmware at the IOVA address lower than the any address of the used
>>>>> buffers.
>>>>
>>>> I've been working on adding a way to automatically add direct mappings
>>>> using reserved-memory regions parsed from device tree, see:
>>>>
>>>> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
>>>>  
>>>>
>>>>
>>>> Perhaps this can be of use? With that you should be able to add a
>>>> reserved-memory region somewhere in the lower range that you need for
>>>> firmware images and have that automatically added as a direct mapping
>>>> so that it won't be reused later on for dynamic allocations.
>>>
>>> It can't easily be a *direct* mapping though - if the driver has to
>>> use the DMA masks to ensure that everything stays within the
>>> addressable range, then (as far as I'm aware) there's no physical
>>> memory that low down to equal the DMA addresses.
>>>
>>> TBH I'm not convinced that this is a sufficiently common concern to
>>> justify new APIs, or even to try to make overly generic. I think just
>>> implementing a new DMA attribute to say "please allocate/map this
>>> particular request at the lowest DMA address possible" would be good
>>> enough. Such a thing could also serve PCI drivers that actually care
>>> about SAC/DAC to give us more of a chance of removing the "try a
>>> 32-bit mask first" trick from everyone's hotpath...
>>
>> Hmm, I like the idea of such DMA attribute! It should make things really
>> simple, especially in the drivers. Thanks for the great idea! I will try
>> to implement it then instead of the workarounds I've proposed in
>> s5p-mfc/exynos4-is drivers.
>
> Right, I think it's fair to draw a line and say that anyone who wants 
> a *specific* address needs to manage their own IOMMU domain.
>
> In the backend I suspect it's going to be cleanest to implement a 
> dedicated iova_alloc_low() (or similar) function in the IOVA API that 
> sidesteps all of the existing allocation paths and goes straight to 
> the rbtree.

Just for the record - I've implemented this approach here: 
https://lore.kernel.org/lkml/20200925141218.13550-1-m.szyprow...@samsung.com/

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/8] media: platform: exynos4-is: use DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
Exynos4-IS driver relied on the way the ARM DMA-IOMMU glue code worked -
mainly it relied on the fact that the allocator used first-fit algorithm
and the first allocated buffer were at 0x0 DMA/IOVA address. This is not
true for the generic IOMMU-DMA glue code that will be used for ARM
architecture soon, so add the needed DMA attribute to force such behavior
of the DMA-mapping code.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/platform/exynos4-is/fimc-is.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-is.c 
b/drivers/media/platform/exynos4-is/fimc-is.c
index 41b841a96338..9d3556eae5d3 100644
--- a/drivers/media/platform/exynos4-is/fimc-is.c
+++ b/drivers/media/platform/exynos4-is/fimc-is.c
@@ -335,8 +335,9 @@ static int fimc_is_alloc_cpu_memory(struct fimc_is *is)
 {
struct device *dev = >pdev->dev;
 
-   is->memory.vaddr = dma_alloc_coherent(dev, FIMC_IS_CPU_MEM_SIZE,
- >memory.addr, GFP_KERNEL);
+   is->memory.vaddr = dma_alloc_attrs(dev, FIMC_IS_CPU_MEM_SIZE,
+  >memory.addr, GFP_KERNEL,
+  DMA_ATTR_LOW_ADDRESS);
if (is->memory.vaddr == NULL)
return -ENOMEM;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/8] iommu: dma-iommu: refactor iommu_dma_alloc_iova()

2020-09-25 Thread Marek Szyprowski
Change the parameters passed to iommu_dma_alloc_iova(): the dma_limit can
be easily extracted from the parameters of the passed struct device, so
replace it with a flags parameter, which can later hold more information
about the way the IOVA allocator should do it job. While touching the
parameter list, move struct device to the second position to better match
the convention of the DMA-mapping related functions.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 91dd8f46dae1..0ea87023306f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -400,12 +400,16 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
}
 }
 
+#define DMA_ALLOC_IOVA_COHERENTBIT(0)
+
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
-   size_t size, u64 dma_limit, struct device *dev)
+   struct device *dev, size_t size, unsigned int flags)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
+   u64 dma_limit = (flags & DMA_ALLOC_IOVA_COHERENT) ?
+   dev->coherent_dma_mask : dma_get_mask(dev);
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -481,7 +485,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, u64 dma_mask)
+   size_t size, int prot, unsigned int flags)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -494,7 +498,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 
size = iova_align(iovad, size + iova_off);
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
return iova;
 
@@ -618,7 +622,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -733,7 +737,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev));
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, 0);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(phys, size, dir);
@@ -888,7 +892,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
-   iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, dev, iova_len, 0);
if (iova == DMA_MAPPING_ERROR)
goto out_restore_sg;
 
@@ -936,8 +940,7 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-   dma_get_mask(dev));
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, 0);
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
@@ -1045,7 +1048,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return NULL;
 
*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
-   dev->coherent_dma_mask);
+ DMA_ALLOC_IOVA_COHERENT);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
@@ -1182,7 +1185,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, 0);
if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/8] iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
Implement support for the DMA_ATTR_LOW_ADDRESS DMA attribute. If it has
been set, call alloc_iova_first_fit() instead of the alloc_iova_fast() to
allocate the new IOVA from the beginning of the address space.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c | 50 +--
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0ea87023306f..ab39659c727a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -401,6 +401,18 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
 }
 
 #define DMA_ALLOC_IOVA_COHERENTBIT(0)
+#define DMA_ALLOC_IOVA_FIRST_FIT   BIT(1)
+
+static unsigned int dma_attrs_to_alloc_flags(unsigned long attrs, bool 
coherent)
+{
+   unsigned int flags = 0;
+
+   if (coherent)
+   flags |= DMA_ALLOC_IOVA_COHERENT;
+   if (attrs & DMA_ATTR_LOW_ADDRESS)
+   flags |= DMA_ALLOC_IOVA_FIRST_FIT;
+   return flags;
+}
 
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
struct device *dev, size_t size, unsigned int flags)
@@ -433,13 +445,23 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
 
/* Try to get PCI devices a SAC address */
-   if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
-   iova = alloc_iova_fast(iovad, iova_len,
-  DMA_BIT_MASK(32) >> shift, false);
+   if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   DMA_BIT_MASK(32) >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+ DMA_BIT_MASK(32) >> shift, false);
+   }
 
-   if (iova == IOVA_BAD_ADDR)
-   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
-  true);
+   if (iova == IOVA_BAD_ADDR) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   dma_limit >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+  dma_limit >> shift, true);
+   }
 
if (iova != IOVA_BAD_ADDR)
return (dma_addr_t)iova << shift;
@@ -593,6 +615,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
struct iova_domain *iovad = >iovad;
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, true);
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
@@ -622,7 +645,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -732,12 +755,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
 {
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, false);
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot, 0);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, flags);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(phys, size, dir);
@@ -842,6 +866,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, false);
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -892,7 +917,7 @@ static int iommu_dma_map_sg(struct device *dev, st

[PATCH 6/8] media: platform: exynos4-is: remove all references to physicall addresses

2020-09-25 Thread Marek Szyprowski
This driver always operates on the DMA/IOVA addresses, so calling them
physicall addresses is misleading, although when no IOMMU is used they
equal each other. Fix this by renaming all such entries to 'addr' and
adjusting comments.

Signed-off-by: Marek Szyprowski 
---
 .../media/platform/exynos4-is/fimc-capture.c  |  6 ++--
 drivers/media/platform/exynos4-is/fimc-core.c | 28 +--
 drivers/media/platform/exynos4-is/fimc-core.h | 18 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   | 20 ++---
 drivers/media/platform/exynos4-is/fimc-is.h   |  6 ++--
 .../media/platform/exynos4-is/fimc-lite-reg.c |  4 +--
 drivers/media/platform/exynos4-is/fimc-lite.c |  2 +-
 drivers/media/platform/exynos4-is/fimc-lite.h |  4 +--
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  8 +++---
 drivers/media/platform/exynos4-is/fimc-reg.c  | 18 ++--
 drivers/media/platform/exynos4-is/fimc-reg.h  |  4 +--
 11 files changed, 58 insertions(+), 60 deletions(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-capture.c 
b/drivers/media/platform/exynos4-is/fimc-capture.c
index 6000a4e789ad..13c838d3f947 100644
--- a/drivers/media/platform/exynos4-is/fimc-capture.c
+++ b/drivers/media/platform/exynos4-is/fimc-capture.c
@@ -201,7 +201,7 @@ void fimc_capture_irq_handler(struct fimc_dev *fimc, int 
deq_buf)
if (!list_empty(>pending_buf_q)) {
 
v_buf = fimc_pending_queue_pop(cap);
-   fimc_hw_set_output_addr(fimc, _buf->paddr, cap->buf_index);
+   fimc_hw_set_output_addr(fimc, _buf->addr, cap->buf_index);
v_buf->index = cap->buf_index;
 
/* Move the buffer to the capture active queue */
@@ -410,7 +410,7 @@ static void buffer_queue(struct vb2_buffer *vb)
int min_bufs;
 
spin_lock_irqsave(>slock, flags);
-   fimc_prepare_addr(ctx, >vb.vb2_buf, >d_frame, >paddr);
+   fimc_prepare_addr(ctx, >vb.vb2_buf, >d_frame, >addr);
 
if (!test_bit(ST_CAPT_SUSPENDED, >state) &&
!test_bit(ST_CAPT_STREAM, >state) &&
@@ -419,7 +419,7 @@ static void buffer_queue(struct vb2_buffer *vb)
int buf_id = (vid_cap->reqbufs_count == 1) ? -1 :
vid_cap->buf_index;
 
-   fimc_hw_set_output_addr(fimc, >paddr, buf_id);
+   fimc_hw_set_output_addr(fimc, >addr, buf_id);
buf->index = vid_cap->buf_index;
fimc_active_queue_add(vid_cap, buf);
 
diff --git a/drivers/media/platform/exynos4-is/fimc-core.c 
b/drivers/media/platform/exynos4-is/fimc-core.c
index 08d1f39a914c..c989abeb478e 100644
--- a/drivers/media/platform/exynos4-is/fimc-core.c
+++ b/drivers/media/platform/exynos4-is/fimc-core.c
@@ -325,7 +325,7 @@ static irqreturn_t fimc_irq_handler(int irq, void *priv)
 
 /* The color format (colplanes, memplanes) must be already configured. */
 int fimc_prepare_addr(struct fimc_ctx *ctx, struct vb2_buffer *vb,
- struct fimc_frame *frame, struct fimc_addr *paddr)
+ struct fimc_frame *frame, struct fimc_addr *addr)
 {
int ret = 0;
u32 pix_size;
@@ -338,42 +338,40 @@ int fimc_prepare_addr(struct fimc_ctx *ctx, struct 
vb2_buffer *vb,
dbg("memplanes= %d, colplanes= %d, pix_size= %d",
frame->fmt->memplanes, frame->fmt->colplanes, pix_size);
 
-   paddr->y = vb2_dma_contig_plane_dma_addr(vb, 0);
+   addr->y = vb2_dma_contig_plane_dma_addr(vb, 0);
 
if (frame->fmt->memplanes == 1) {
switch (frame->fmt->colplanes) {
case 1:
-   paddr->cb = 0;
-   paddr->cr = 0;
+   addr->cb = 0;
+   addr->cr = 0;
break;
case 2:
/* decompose Y into Y/Cb */
-   paddr->cb = (u32)(paddr->y + pix_size);
-   paddr->cr = 0;
+   addr->cb = (u32)(addr->y + pix_size);
+   addr->cr = 0;
break;
case 3:
-   paddr->cb = (u32)(paddr->y + pix_size);
+   addr->cb = (u32)(addr->y + pix_size);
/* decompose Y into Y/Cb/Cr */
if (FIMC_FMT_YCBCR420 == frame->fmt->color)
-   paddr->cr = (u32)(paddr->cb
-   + (pix_size >> 2));
+   addr->cr = (u32)(addr->cb + (pix_size >> 2));
else /* 422 */
-   paddr->cr = (u32)(paddr->cb
-   + (pix_size >> 1));
+

[PATCH 8/8] media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
S5P-MFC driver relied on the way the ARM DMA-IOMMU glue code worked -
mainly it relied on the fact that the allocator used first-fit algorithm
and the first allocated buffer were at 0x0 DMA/IOVA address. This is not
true for the generic IOMMU-DMA glue code that will be used for ARM
architecture soon, so limit the dma_mask to size of the DMA window the
hardware can use and add the needed DMA attribute to force proper IOVA
allocation of the firmware buffer.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/platform/s5p-mfc/s5p_mfc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index eba2b9f040df..171fd9fd22e4 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1199,8 +1199,12 @@ static int s5p_mfc_configure_common_memory(struct 
s5p_mfc_dev *mfc_dev)
if (!mfc_dev->mem_bitmap)
return -ENOMEM;
 
-   mfc_dev->mem_virt = dma_alloc_coherent(dev, mem_size,
-  _dev->mem_base, GFP_KERNEL);
+   /* MFC v5 can access memory only via the 256M window */
+   if (exynos_is_iommu_available(dev) && !IS_MFCV6_PLUS(mfc_dev))
+   dma_set_mask_and_coherent(dev, SZ_256M - 1);
+
+   mfc_dev->mem_virt = dma_alloc_attrs(dev, mem_size, _dev->mem_base,
+   GFP_KERNEL, DMA_ATTR_LOW_ADDRESS);
if (!mfc_dev->mem_virt) {
kfree(mfc_dev->mem_bitmap);
dev_err(dev, "failed to preallocate %ld MiB for the firmware 
and context buffers\n",
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/8] iommu: iova: properly handle 0 as a valid IOVA address

2020-09-25 Thread Marek Szyprowski
Zero is a valid DMA and IOVA address on many architectures, so adjust the
IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
(~0UL) is introduced as a generic value for the error case. Adjust all
callers of the alloc_iova_fast() function for the new return value.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c   | 18 ++
 drivers/iommu/intel/iommu.c | 12 ++--
 drivers/iommu/iova.c| 10 ++
 include/linux/iova.h|  2 ++
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cd6e3c70ebb3..91dd8f46dae1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -405,7 +405,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
-   unsigned long shift, iova_len, iova = 0;
+   unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -433,11 +433,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);
 
-   if (!iova)
+   if (iova == IOVA_BAD_ADDR)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
   true);
 
-   return (dma_addr_t)iova << shift;
+   if (iova != IOVA_BAD_ADDR)
+   return (dma_addr_t)iova << shift;
+   return DMA_MAPPING_ERROR;
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
@@ -493,8 +495,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size = iova_align(iovad, size + iova_off);
 
iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
-   if (!iova)
-   return DMA_MAPPING_ERROR;
+   if (iova == DMA_MAPPING_ERROR)
+   return iova;
 
if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
@@ -617,7 +619,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
size = iova_align(iovad, size);
iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL))
@@ -887,7 +889,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
}
 
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_restore_sg;
 
/*
@@ -1181,7 +1183,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
 
iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
if (iommu_map(domain, iova, msi_addr, size, prot))
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 00963cedfd83..885d0dee39cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3416,15 +3416,15 @@ static unsigned long intel_alloc_iova(struct device 
*dev,
 */
iova_pfn = alloc_iova_fast(>iovad, nrpages,
   IOVA_PFN(DMA_BIT_MASK(32)), false);
-   if (iova_pfn)
+   if (iova_pfn != IOVA_BAD_ADDR)
return iova_pfn;
}
iova_pfn = alloc_iova_fast(>iovad, nrpages,
   IOVA_PFN(dma_mask), true);
-   if (unlikely(!iova_pfn)) {
+   if (unlikely(iova_pfn == IOVA_BAD_ADDR)) {
dev_err_once(dev, "Allocating %ld-page iova failed\n",
 nrpages);
-   return 0;
+   return IOVA_BAD_ADDR;
}
 
return iova_pfn;
@@ -3454,7 +3454,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
size = aligned_nrpages(paddr, size);
 
iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask);
-   if (!iova_pfn)
+   if (iova_pfn == IOVA_BAD_ADDR)
goto error;
 
/*
@@ -3663,7 +3663,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
 
iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size),
*dev->dma_mask);
-   if (!iova_pfn) {
+   if (iova_pfn == IOVA_BAD_ADDR) {
sglist->dma_length = 0;
return 0;
   

[PATCH 1/8] dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute

2020-09-25 Thread Marek Szyprowski
Some devices require to allocate a special buffer (usually for the
firmware) just at the beginning of the address space to ensure that all
further allocations can be expressed as a positive offset from that
special buffer. When IOMMU is used for managing the DMA address space,
such requirement can be easily fulfilled, simply by enforcing the
'first-fit' IOVA allocation algorithm.

This patch adds a DMA attribute for such case.

Signed-off-by: Marek Szyprowski 
---
 include/linux/dma-mapping.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bb138ac6f5e6..c8c568ba375b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -66,6 +66,12 @@
  * at least read-only at lesser-privileged levels).
  */
 #define DMA_ATTR_PRIVILEGED(1UL << 9)
+/*
+ * DMA_ATTR_LOW_ADDRESS: used to indicate that the buffer should be allocated
+ * at the lowest possible DMA address, usually just at the beginning of the
+ * DMA/IOVA address space ('first-fit' allocation algorithm).
+ */
+#define DMA_ATTR_LOW_ADDRESS   (1UL << 10)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/8] iommu: iova: add support for 'first-fit' algorithm

2020-09-25 Thread Marek Szyprowski
Add support for the 'first-fit' allocation algorithm. It will be used for
the special case of implementing DMA_ATTR_LOW_ADDRESS, so this path
doesn't use IOVA cache.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/iova.c | 78 
 include/linux/iova.h |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 87555ed1737a..0911d36f7ee5 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -227,6 +227,59 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static unsigned long
+__iova_get_aligned_start(unsigned long start, unsigned long size)
+{
+   unsigned long mask = __roundup_pow_of_two(size) - 1;
+
+   return (start + mask) & ~mask;
+}
+
+static int __alloc_and_insert_iova_range_forward(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new)
+{
+   struct rb_node *curr;
+   unsigned long flags;
+   unsigned long start, limit;
+
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+
+   curr = rb_first(>rbroot);
+   limit = limit_pfn;
+   start = __iova_get_aligned_start(iovad->start_pfn, size);
+
+   while (curr) {
+   struct iova *curr_iova = rb_entry(curr, struct iova, node);
+   struct rb_node *next = rb_next(curr);
+
+   start = __iova_get_aligned_start(curr_iova->pfn_hi + 1, size);
+   if (next) {
+   struct iova *next_iova = rb_entry(next, struct iova, 
node);
+   limit = next_iova->pfn_lo - 1;
+   } else {
+   limit = limit_pfn;
+   }
+
+   if ((start + size) <= limit)
+   break;  /* found a free slot */
+   curr = next;
+   }
+
+   if (!curr && start + size > limit) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+
+   new->pfn_lo = start;
+   new->pfn_hi = new->pfn_lo + size - 1;
+   iova_insert_rbtree(>rbroot, new, curr);
+
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+
+   return 0;
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -398,6 +451,31 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
+/**
+ * alloc_iova_first_fit - allocates an iova from the beginning of address space
+ * @iovad: - iova domain in question
+ * @size: - size of page frames to allocate
+ * @limit_pfn: - max limit address
+ * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
+ * of a failure.
+*/
+unsigned long
+alloc_iova_first_fit(struct iova_domain *iovad, unsigned long size,
+unsigned long limit_pfn)
+{
+   struct iova *new_iova = alloc_iova_mem();
+
+   if (!new_iova)
+   return IOVA_BAD_ADDR;
+
+   if (__alloc_and_insert_iova_range_forward(iovad, size, limit_pfn, 
new_iova)) {
+   free_iova_mem(new_iova);
+   return IOVA_BAD_ADDR;
+   }
+   return new_iova->pfn_lo;
+}
+EXPORT_SYMBOL_GPL(alloc_iova_first_fit);
+
 /**
  * alloc_iova_fast - allocates an iova from rcache
  * @iovad: - iova domain in question
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 69737e6bcef6..01c29044488c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -152,6 +152,8 @@ void queue_iova(struct iova_domain *iovad,
unsigned long data);
 unsigned long alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
  unsigned long limit_pfn, bool flush_rcache);
+unsigned long alloc_iova_first_fit(struct iova_domain *iovad, unsigned long 
size,
+  unsigned long limit_pfn);
 struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
unsigned long pfn_hi);
 void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/8] IOMMU-DMA - support old allocation algorithm used on ARM

2020-09-25 Thread Marek Szyprowski
Hi,

This patchset is a continuation of the planned rework of the ARM
IOMMU/DMA-mapping code proposed by Robin Murphy in [1]. However, there
are drivers (for example S5P-MFC and Exynos4-IS) which depend on the way
the old ARM IOMMU/DMA-mapping glue code worked (it used 'first-fit' IOVA
allocation algorithm), so before switching ARM to the generic code, such
drivers have to be updated.

This patchset provides the needed extensions to the generic IOMMU-DMA
framework to enable support for the drivers that relied on the old ARM
IOMMU/DMA-mapping behavior. This patchset is based on the idea proposed
by Robin Murphy in [2] after the discussion of the workaround implemented
directly in the mentioned drivers [3].

Here is a git branch with this patchset and [1] patches applied on top of
linux next-20200925:
https://github.com/mszyprow/linux/tree/v5.9-next-20200925-arm-dma-iommu-low-address

Best regards,
Marek Szyprowski


References:

[1] https://lore.kernel.org/lkml/cover.1597931875.git.robin.mur...@arm.com/
[2] 
https://lore.kernel.org/linux-iommu/bff57cbe-2247-05e1-9059-d9c66d64c...@arm.com/
[3] 
https://lore.kernel.org/linux-samsung-soc/20200918144833.14618-1-m.szyprow...@samsung.com/T/


Patch summary:

Marek Szyprowski (8):
  dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute
  iommu: iova: properly handle 0 as a valid IOVA address
  iommu: iova: add support for 'first-fit' algorithm
  iommu: dma-iommu: refactor iommu_dma_alloc_iova()
  iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS
  media: platform: exynos4-is: remove all references to physicall
addresses
  media: platform: exynos4-is: use DMA_ATTR_LOW_ADDRESS
  media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

 drivers/iommu/dma-iommu.c | 79 -
 drivers/iommu/intel/iommu.c   | 12 +--
 drivers/iommu/iova.c  | 88 ++-
 .../media/platform/exynos4-is/fimc-capture.c  |  6 +-
 drivers/media/platform/exynos4-is/fimc-core.c | 28 +++---
 drivers/media/platform/exynos4-is/fimc-core.h | 18 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   | 23 ++---
 drivers/media/platform/exynos4-is/fimc-is.h   |  6 +-
 .../media/platform/exynos4-is/fimc-lite-reg.c |  4 +-
 drivers/media/platform/exynos4-is/fimc-lite.c |  2 +-
 drivers/media/platform/exynos4-is/fimc-lite.h |  4 +-
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  8 +-
 drivers/media/platform/exynos4-is/fimc-reg.c  | 18 ++--
 drivers/media/platform/exynos4-is/fimc-reg.h  |  4 +-
 drivers/media/platform/s5p-mfc/s5p_mfc.c  |  8 +-
 include/linux/dma-mapping.h   |  6 ++
 include/linux/iova.h  |  4 +
 17 files changed, 221 insertions(+), 97 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Robin,

On 24.09.2020 12:40, Robin Murphy wrote:
> On 2020-09-24 11:16, Thierry Reding wrote:
>> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>>> On 24.09.2020 10:28, Joerg Roedel wrote:
>>>> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>>>>> It allows to remap given buffer at the specific IOVA address, 
>>>>> although
>>>>> it doesn't guarantee that those specific addresses won't be later 
>>>>> used
>>>>> by the IOVA allocator. Probably it would make sense to add an API for
>>>>> generic IOMMU-DMA framework to mark the given IOVA range as
>>>>> reserved/unused to protect them.
>>>> There is an API for that, the IOMMU driver can return IOVA reserved
>>>> regions per device and the IOMMU core code will take care of mapping
>>>> these regions and reserving them in the IOVA allocator, so that
>>>> DMA-IOMMU code will not use it for allocations.
>>>>
>>>> Have a look at the iommu_ops->get_resv_regions() and
>>>> iommu_ops->put_resv_regions().
>>>
>>> I know about the reserved regions IOMMU API, but the main problem here,
>>> in case of Exynos, is that those reserved regions won't be created by
>>> the IOMMU driver but by the IOMMU client device. It is just a result 
>>> how
>>> the media drivers manages their IOVA space. They simply have to load
>>> firmware at the IOVA address lower than the any address of the used
>>> buffers.
>>
>> I've been working on adding a way to automatically add direct mappings
>> using reserved-memory regions parsed from device tree, see:
>>
>> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
>>
>> Perhaps this can be of use? With that you should be able to add a
>> reserved-memory region somewhere in the lower range that you need for
>> firmware images and have that automatically added as a direct mapping
>> so that it won't be reused later on for dynamic allocations.
>
> It can't easily be a *direct* mapping though - if the driver has to 
> use the DMA masks to ensure that everything stays within the 
> addressable range, then (as far as I'm aware) there's no physical 
> memory that low down to equal the DMA addresses.
>
> TBH I'm not convinced that this is a sufficiently common concern to 
> justify new APIs, or even to try to make overly generic. I think just 
> implementing a new DMA attribute to say "please allocate/map this 
> particular request at the lowest DMA address possible" would be good 
> enough. Such a thing could also serve PCI drivers that actually care 
> about SAC/DAC to give us more of a chance of removing the "try a 
> 32-bit mask first" trick from everyone's hotpath...

Hmm, I like the idea of such DMA attribute! It should make things really 
simple, especially in the drivers. Thanks for the great idea! I will try 
to implement it then instead of the workarounds I've proposed in 
s5p-mfc/exynos4-is drivers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Thierry,

On 24.09.2020 12:16, Thierry Reding wrote:
> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>> On 24.09.2020 10:28, Joerg Roedel wrote:
>>> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>>>> It allows to remap given buffer at the specific IOVA address, although
>>>> it doesn't guarantee that those specific addresses won't be later used
>>>> by the IOVA allocator. Probably it would make sense to add an API for
>>>> generic IOMMU-DMA framework to mark the given IOVA range as
>>>> reserved/unused to protect them.
>>> There is an API for that, the IOMMU driver can return IOVA reserved
>>> regions per device and the IOMMU core code will take care of mapping
>>> these regions and reserving them in the IOVA allocator, so that
>>> DMA-IOMMU code will not use it for allocations.
>>>
>>> Have a look at the iommu_ops->get_resv_regions() and
>>> iommu_ops->put_resv_regions().
>> I know about the reserved regions IOMMU API, but the main problem here,
>> in case of Exynos, is that those reserved regions won't be created by
>> the IOMMU driver but by the IOMMU client device. It is just a result how
>> the media drivers manages their IOVA space. They simply have to load
>> firmware at the IOVA address lower than the any address of the used
>> buffers.
> I've been working on adding a way to automatically add direct mappings
> using reserved-memory regions parsed from device tree, see:
>
>  
> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
>
> Perhaps this can be of use? With that you should be able to add a
> reserved-memory region somewhere in the lower range that you need for
> firmware images and have that automatically added as a direct mapping
> so that it won't be reused later on for dynamic allocations.

Frankly, using that would be even bigger hack than what I've proposed in 
my workaround. I see no point polluting DT with such artificial regions 
just to ensure specific IOVA space layout.

I just looking for a nice way of reusing most of the IOMMU DMA-mapping 
code with a small addition of manual IOVA space management (just to 
remap the firmware buffer at the specific IOVA address).

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Joerg,

On 24.09.2020 10:28, Joerg Roedel wrote:
> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>> It allows to remap given buffer at the specific IOVA address, although
>> it doesn't guarantee that those specific addresses won't be later used
>> by the IOVA allocator. Probably it would make sense to add an API for
>> generic IOMMU-DMA framework to mark the given IOVA range as
>> reserved/unused to protect them.
> There is an API for that, the IOMMU driver can return IOVA reserved
> regions per device and the IOMMU core code will take care of mapping
> these regions and reserving them in the IOVA allocator, so that
> DMA-IOMMU code will not use it for allocations.
>
> Have a look at the iommu_ops->get_resv_regions() and
> iommu_ops->put_resv_regions().

I know about the reserved regions IOMMU API, but the main problem here, 
in case of Exynos, is that those reserved regions won't be created by 
the IOMMU driver but by the IOMMU client device. It is just a result how 
the media drivers manages their IOVA space. They simply have to load 
firmware at the IOVA address lower than the any address of the used 
buffers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-23 Thread Marek Szyprowski
Hi Shaik,

I've run into similar problem while adapting S5P-MFC and Exynos4-IS 
drivers for generic IOMMU-DMA framework. Here is my first solution: 
https://lore.kernel.org/linux-samsung-soc/20200918144833.14618-1-m.szyprow...@samsung.com/T/
 


It allows to remap given buffer at the specific IOVA address, although 
it doesn't guarantee that those specific addresses won't be later used 
by the IOVA allocator. Probably it would make sense to add an API for 
generic IOMMU-DMA framework to mark the given IOVA range as 
reserved/unused to protect them.

Best regards
Marek Szyprowski, PhD
Samsung R Institute Poland

On 24.04.2020 18:15, Shaik Ameer Basha wrote:
> On Fri, Apr 24, 2020 at 8:59 PM Robin Murphy  wrote:
>> On 2020-04-24 4:04 pm, Ajay kumar wrote:
>>> Can someone check this?
>>>
>>> On Mon, Apr 20, 2020 at 9:24 PM Ajay kumar  wrote:
>>>> Hi All,
>>>>
>>>> I have an IOMMU master which has limitations as mentioned below:
>>>> 1) The IOMMU master internally executes a firmware, and the firmware memory
>>>> is allocated by the same master driver.
>>>> The firmware buffer address should be of the lowest range than other 
>>>> address
>>>> allocated by the device, or in other words, all the remaining buffer 
>>>> addresses
>>>> should always be in a higher range than the firmware address.
>>>> 2) None of the buffer addresses should go beyond 0xC000_
>> That particular constraint could (and perhaps should) be expressed as a
>> DMA mask/limit for the device, but if you have specific requirements to
> Yes Robin. We do use 0xC000_ address to set the DMA mask in our driver.
>
>> place buffers at particular addresses then you might be better off
>> managing your own IOMMU domain like some other (mostly DRM) drivers do.
> If you remember any of such drivers can you please point the driver path ?
>
>> The DMA APIs don't offer any guarantees about what addresses you'll get
>> other than that they won't exceed the appropriate mask.
> True, we have gone through most of the APIs and didn't find any way to match 
> our
> requirements with the existing DMA APIs
>
>>>> example:
>>>> If firmware buffer address is buf_fw = 0x8000_5000;
>>>> All other addresses given to the device should be greater than
>>>> (0x8000_5000 + firmware size) and less than 0xC000_
>> Out of curiosity, how do you control that in the no-IOMMU or IOMMU
>> passthrough cases?
> We manage the no-IOMMU or pass through cases using the reserved-memory.
>
>> Robin.
>>
>>>> Currently, this is being handled with one of the below hacks:
>>>> 1) By keeping dma_mask in lower range while allocating firmware buffer,
>>>> and then increasing the dma_mask to higher range for other buffers.
>>>> 2) By reserving IOVA for firmware at the lowest range and creating direct 
>>>> mappings for the same.
>>>>
>>>> I want to know if there is a better way this can be handled with current 
>>>> framework,
>>>> or if anybody is facing similar problems with their devices,
>>>> please share how it is taken care.
>>>>
>>>> I also think there should be some way the masters can specify the IOVA
>>>> range they want to limit to for current allocation.
>>>> Something like a new iommu_ops callback like below:
>>>> limit_iova_alloc_range(dev, iova_start, iova_end)
>>>>
>>>> And, in my driver, the sequence will be:
>>>> limit_iova_alloc_range(dev, 0x_, 0x1000_); /* via helpers */
>>>> alloc( ) firmware buffer using DMA API
>>>> limit_iova_alloc_range(dev, 0x1000_, 0xC000_); /* via helpers */
>>>> alloc( ) other buffers using DMA API
>>>>
> Just want to understand more from you, on the new iommu_ops we suggested.
> Shouldn't device have that flexibility to allocate IOVA as per it's 
> requirement?
> If you see our device as example, we need to have control on the
> allocated IOVA region
> based on where device is using this buffer.
>
> If we have these callbacks in place, then the low level IOMMU driver
> can implement and
> manage such requests when needed.
>
> If this can't be taken forward for some right reasons, then we will
> definitely try to understand
> on how to manage the IOMMU domain from our driver as per your suggestion
>
> - Shaik.
>
>>>> Thanks,
>>>> Ajay Kumar

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 05/38] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-22 Thread Marek Szyprowski
Hi Alex,

On 22.09.2020 01:15, Alex Goins wrote:
> Tested-by: Alex Goins 
>
> This change fixes a regression with drm_prime_sg_to_page_addr_arrays() and
> AMDGPU in v5.9.

Thanks for testing!

> Commit 39913934 similarly revamped AMDGPU to use sgtable helper functions. 
> When
> it changed from dma_map_sg_attrs() to dma_map_sgtable(), as a side effect it
> started correctly updating sgt->nents to the return value of 
> dma_map_sg_attrs().
> However, drm_prime_sg_to_page_addr_arrays() incorrectly uses sgt->nents to
> iterate over pages, rather than sgt->orig_nents, resulting in it now returning
> the incorrect number of pages on AMDGPU.
>
> I had written a patch that changes drm_prime_sg_to_page_addr_arrays() to use
> for_each_sgtable_sg() instead of for_each_sg(), iterating using 
> sgt->orig_nents:
>
> -   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
> +   for_each_sgtable_sg(sgt, sg, count) {
>
> This patch takes it further, but still has the effect of fixing the number of
> pages that drm_prime_sg_to_page_addr_arrays() returns. Something like this
> should be included in v5.9 to prevent a regression with AMDGPU.

Probably the easiest way to handle a fix for v5.9 would be to simply 
merge the latest version of this patch also to v5.9-rcX: 
https://lore.kernel.org/dri-devel/20200904131711.12950-3-m.szyprow...@samsung.com/
 


This way we would get it fixed and avoid possible conflict in the -next. 
Do you have any AMDGPU fixes for v5.9 in the queue? Maybe you can add 
that patch to the queue? Dave: would it be okay that way?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()

2020-09-18 Thread Marek Szyprowski
Hi

On 18.09.2020 03:13, Yu Kuai wrote:
> if of_find_device_by_node() succeed, exynos_iommu_of_xlate() doesn't have
> a corresponding put_device(). Thus add put_device() to fix the exception
> handling for this function implementation.
>
> Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices 
> from device tree")
> Signed-off-by: Yu Kuai 

Thanks for the fix!

Acked-by: Marek Szyprowski 

> ---
>   drivers/iommu/exynos-iommu.c | 8 ++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index bad3c0ce10cb..de324b4eedfe 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -1295,13 +1295,17 @@ static int exynos_iommu_of_xlate(struct device *dev,
>   return -ENODEV;
>   
>   data = platform_get_drvdata(sysmmu);
> - if (!data)
> + if (!data) {
> + put_device(>dev);
>   return -ENODEV;
> + }
>   
>   if (!owner) {
>   owner = kzalloc(sizeof(*owner), GFP_KERNEL);
> - if (!owner)
> + if (!owner) {
> + put_device(>dev);
>   return -ENOMEM;
> +     }
>   
>   INIT_LIST_HEAD(>controllers);
>   mutex_init(>rpm_lock);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/18] Convert arch/arm to use iommu-dma

2020-09-18 Thread Marek Szyprowski
Hi all,

On 24.08.2020 13:40, Marek Szyprowski wrote:
> On 20.08.2020 17:08, Robin Murphy wrote:
>> After 5 years or so of intending to get round to this, finally the
>> time comes! The changes themselves actualy turn out to be relatively
>> mechanical; the bigger concern appears to be how to get everything
>> merged across about 5 diffferent trees given the dependencies.
>>
>> I've lightly boot-tested things on Rockchip RK3288 and Exynos 4412
>> (Odroid-U3), to the degree that their display drivers should be using
>> IOMMU-backed buffers and don't explode (the Odroid doesn't manage to
>> send a working HDMI signal to the one monitor I have that it actually
>> detects, but that's a pre-existing condition...) Confirmation that the
>> Mediatek, OMAP and Tegra changes work will be most welcome.
>>
>> Patches are based on 5.9-rc1, branch available here:
>>
>>    git://linux-arm.org/linux-rm arm/dma
>
> Well, my first proposal for the ARM and ARM64 DMA-mapping unification 
> has been posted long time ago: https://lkml.org/lkml/2016/2/19/79
>
> Thanks for resurrecting it! :)
>
> I've tested this patchset on various ARM32bit Exynos based boards (not 
> only Exynos4412) and most of them works fine after your conversion. 
> However there are issues you cannot learn from the code.
>
> Conversion of the Exynos DRM was straightforward (thanks!), but there 
> are other Exynos drivers that depends on the old ARM implementation. 
> The S5P-MFC (only for the v5 hardware) and Exynos4 FIMC-ISP drivers 
> depends on the first-fit IOVA allocation algorithm in the old ARM 
> DMA-mapping. This was the main reason I've didn't continue my initial 
> conversion attempt.
>
> Both drivers allocate a buffer for their firmware and then in the 
> hardware registers address video buffers as an offset from the 
> begginning of the firmware. This doesn't work when underlying 
> DMA-mapping allocates IOVA with the last-fit algorithm, what the 
> drivers/iommu/dma-iommu.c does. So far I didn't find a good solution 
> for that issue.
>
> I'm open for suggestions. One more limitation for the S5P-MFC driver 
> is that the hardware is capable only for addressing 128MiB. They will 
> probably need to call IOMMU API directly, but I would like to keep as 
> much from the IOMMU/DMA-mapping code as possible.

Just for the record. I've finally managed to add needed workarounds to 
the both problematic Exynos4 drivers, so they work fine with this 
patchset. It turned out that it wasn't that hard:

https://lore.kernel.org/linux-samsung-soc/20200918144833.14618-1-m.szyprow...@samsung.com/T/#t

So from my side you have a green light to go ahead and switch ARM 32bit 
to generic code. Time to say good bye to the one of my biggest 
architecture related things merged once to mainline Linux. ;)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 14/18] drm/exynos: Consolidate IOMMU mapping code

2020-09-18 Thread Marek Szyprowski
(>runqueue_work);
> - exynos_drm_unregister_dma(g2d->drm_dev, dev, >dma_priv);
> + exynos_drm_unregister_dma(g2d->drm_dev, dev);
>   }
>   
>   static const struct component_ops g2d_component_ops = {
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_gsc.c 
> b/drivers/gpu/drm/exynos/exynos_drm_gsc.c
> index 45e9aee8366a..88b6fcaa20be 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_gsc.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_gsc.c
> @@ -97,7 +97,6 @@ struct gsc_scaler {
>   struct gsc_context {
>   struct exynos_drm_ipp ipp;
>   struct drm_device *drm_dev;
> - void*dma_priv;
>   struct device   *dev;
>   struct exynos_drm_ipp_task  *task;
>   struct exynos_drm_ipp_formats   *formats;
> @@ -1170,7 +1169,7 @@ static int gsc_bind(struct device *dev, struct device 
> *master, void *data)
>   
>   ctx->drm_dev = drm_dev;
>   ctx->drm_dev = drm_dev;
> - exynos_drm_register_dma(drm_dev, dev, >dma_priv);
> + exynos_drm_register_dma(drm_dev, dev);
>   
>   exynos_drm_ipp_register(dev, ipp, _funcs,
>   DRM_EXYNOS_IPP_CAP_CROP | DRM_EXYNOS_IPP_CAP_ROTATE |
> @@ -1190,7 +1189,7 @@ static void gsc_unbind(struct device *dev, struct 
> device *master,
>   struct exynos_drm_ipp *ipp = >ipp;
>   
>   exynos_drm_ipp_unregister(dev, ipp);
> - exynos_drm_unregister_dma(drm_dev, dev, >dma_priv);
> + exynos_drm_unregister_dma(drm_dev, dev);
>   }
>   
>   static const struct component_ops gsc_component_ops = {
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_rotator.c 
> b/drivers/gpu/drm/exynos/exynos_drm_rotator.c
> index 2d94afba031e..f22fa0d2621a 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_rotator.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_rotator.c
> @@ -56,7 +56,6 @@ struct rot_variant {
>   struct rot_context {
>   struct exynos_drm_ipp ipp;
>   struct drm_device *drm_dev;
> - void*dma_priv;
>   struct device   *dev;
>   void __iomem*regs;
>   struct clk  *clock;
> @@ -244,7 +243,7 @@ static int rotator_bind(struct device *dev, struct device 
> *master, void *data)
>   
>   rot->drm_dev = drm_dev;
>   ipp->drm_dev = drm_dev;
> - exynos_drm_register_dma(drm_dev, dev, >dma_priv);
> + exynos_drm_register_dma(drm_dev, dev);
>   
>   exynos_drm_ipp_register(dev, ipp, _funcs,
>  DRM_EXYNOS_IPP_CAP_CROP | DRM_EXYNOS_IPP_CAP_ROTATE,
> @@ -262,7 +261,7 @@ static void rotator_unbind(struct device *dev, struct 
> device *master,
>   struct exynos_drm_ipp *ipp = >ipp;
>   
>   exynos_drm_ipp_unregister(dev, ipp);
> - exynos_drm_unregister_dma(rot->drm_dev, rot->dev, >dma_priv);
> + exynos_drm_unregister_dma(rot->drm_dev, rot->dev);
>   }
>   
>   static const struct component_ops rotator_component_ops = {
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_scaler.c 
> b/drivers/gpu/drm/exynos/exynos_drm_scaler.c
> index ce1857138f89..0c560566bd2e 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_scaler.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_scaler.c
> @@ -39,7 +39,6 @@ struct scaler_data {
>   struct scaler_context {
>   struct exynos_drm_ipp   ipp;
>   struct drm_device   *drm_dev;
> - void*dma_priv;
>   struct device   *dev;
>   void __iomem*regs;
>   struct clk  *clock[SCALER_MAX_CLK];
> @@ -451,7 +450,7 @@ static int scaler_bind(struct device *dev, struct device 
> *master, void *data)
>   
>   scaler->drm_dev = drm_dev;
>   ipp->drm_dev = drm_dev;
> - exynos_drm_register_dma(drm_dev, dev, >dma_priv);
> + exynos_drm_register_dma(drm_dev, dev);
>   
>   exynos_drm_ipp_register(dev, ipp, _funcs,
>   DRM_EXYNOS_IPP_CAP_CROP | DRM_EXYNOS_IPP_CAP_ROTATE |
> @@ -471,8 +470,7 @@ static void scaler_unbind(struct device *dev, struct 
> device *master,
>   struct exynos_drm_ipp *ipp = >ipp;
>   
>   exynos_drm_ipp_unregister(dev, ipp);
> - exynos_drm_unregister_dma(scaler->drm_dev, scaler->dev,
> -   >dma_priv);
> + exynos_drm_unregister_dma(scaler->drm_dev, scaler->dev);
>   }
>   
>   static const struct component_ops scaler_component_ops = {
> diff --git a/drivers/gpu/drm/exynos/exynos_mixer.c 
> b/drivers/gpu/drm/exynos/exynos_mixer.c
> index af192e5a16ef..18972e605c5d 100644
> --- a/drivers/gpu/drm/exynos/exynos_mixer.c
> +++ b/drivers/gpu/drm/exynos/exynos_mixer.c
> @@ -94,7 +94,6 @@ struct mixer_context {
>   struct platform_device *pdev;
>   struct device   *dev;
>   struct drm_device   *drm_dev;
> - void*dma_priv;
>   struct exynos_drm_crtc  *crtc;
>   struct exynos_drm_plane planes[MIXER_WIN_NR];
>   unsigned long   flags;
> @@ -895,14 +894,12 @@ static int mixer_initialize(struct mixer_context 
> *mixer_ctx,
>   }
>   }
>   
> - return exynos_drm_register_dma(drm_dev, mixer_ctx->dev,
> -_ctx->dma_priv);
> + return exynos_drm_register_dma(drm_dev, mixer_ctx->dev);
>   }
>   
>   static void mixer_ctx_remove(struct mixer_context *mixer_ctx)
>   {
> - exynos_drm_unregister_dma(mixer_ctx->drm_dev, mixer_ctx->dev,
> -   _ctx->dma_priv);
> + exynos_drm_unregister_dma(mixer_ctx->drm_dev, mixer_ctx->dev);
>   }
>   
>   static int mixer_enable_vblank(struct exynos_drm_crtc *crtc)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-07 Thread Marek Szyprowski
Hi Tomasz,

On 07.09.2020 15:07, Tomasz Figa wrote:
> On Fri, Sep 4, 2020 at 3:35 PM Marek Szyprowski
>  wrote:
>> Use recently introduced common wrappers operating directly on the struct
>> sg_table objects and scatterlist page iterators to make the code a bit
>> more compact, robust, easier to follow and copy/paste safe.
>>
>> No functional change, because the code already properly did all the
>> scatterlist related calls.
>>
>> Signed-off-by: Marek Szyprowski 
>> Reviewed-by: Robin Murphy 
>> ---
>>   .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
>>   .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
>>   .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
>>   3 files changed, 31 insertions(+), 47 deletions(-)
>>
> Thanks for the patch! Please see my comments inline.
>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
>> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> index ec3446cc45b8..1b242d844dde 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
>> sg_table *sgt)
>>  unsigned int i;
>>  unsigned long size = 0;
>>
>> -   for_each_sg(sgt->sgl, s, sgt->nents, i) {
>> +   for_each_sgtable_dma_sg(sgt, s, i) {
>>  if (sg_dma_address(s) != expected)
>>  break;
>> -   expected = sg_dma_address(s) + sg_dma_len(s);
>> +   expected += sg_dma_len(s);
>>  size += sg_dma_len(s);
>>  }
>>  return size;
>> @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
>>  if (!sgt)
>>  return;
>>
>> -   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
>> -  buf->dma_dir);
>> +   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>>   }
>>
>>   static void vb2_dc_finish(void *buf_priv)
>> @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
>>  if (!sgt)
>>  return;
>>
>> -   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
>> buf->dma_dir);
>> +   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>>   }
>>
>>   /*/
>> @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf 
>> *dbuf,
>>   * memory locations do not require any explicit cache
>>   * maintenance prior or after being used by the device.
>>   */
>> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
>> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
>> + DMA_ATTR_SKIP_CPU_SYNC);
>>  sg_free_table(sgt);
>>  kfree(attach);
>>  db_attach->priv = NULL;
>> @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>>
>>  /* release any previous cache */
>>  if (attach->dma_dir != DMA_NONE) {
>> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
>> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
>> + DMA_ATTR_SKIP_CPU_SYNC);
>>  attach->dma_dir = DMA_NONE;
>>  }
>>
>> @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>>   * mapping to the client with new direction, no cache sync
>>   * required see comment in vb2_dc_dmabuf_ops_detach()
>>   */
>> -   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, 
>> sgt->orig_nents,
>> - dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>> -   if (!sgt->nents) {
>> +   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
>> +   DMA_ATTR_SKIP_CPU_SYNC)) {
>>  pr_err("failed to map scatterlist\n");
>>      mutex_unlock(lock);
>>  return ERR_PTR(-EIO);
> As opposed to dma_map_sg_attrs(), dma_map_sgtable() now returns an
> error c

[PATCH v10 09/30] drm: lima: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Qiang Yu 
---
 drivers/gpu/drm/lima/lima_gem.c | 11 ---
 drivers/gpu/drm/lima/lima_vm.c  |  5 ++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 155f2b4b4030..11223fe348df 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -69,8 +69,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
return ret;
 
if (bo->base.sgt) {
-   dma_unmap_sg(dev, bo->base.sgt->sgl,
-bo->base.sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, bo->base.sgt, DMA_BIDIRECTIONAL, 0);
sg_free_table(bo->base.sgt);
} else {
bo->base.sgt = kmalloc(sizeof(*bo->base.sgt), GFP_KERNEL);
@@ -80,7 +79,13 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
}
}
 
-   dma_map_sg(dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL);
+   ret = dma_map_sgtable(dev, , DMA_BIDIRECTIONAL, 0);
+   if (ret) {
+   sg_free_table();
+   kfree(bo->base.sgt);
+   bo->base.sgt = NULL;
+   return ret;
+   }
 
*bo->base.sgt = sgt;
 
diff --git a/drivers/gpu/drm/lima/lima_vm.c b/drivers/gpu/drm/lima/lima_vm.c
index 5b92fb82674a..2b2739adc7f5 100644
--- a/drivers/gpu/drm/lima/lima_vm.c
+++ b/drivers/gpu/drm/lima/lima_vm.c
@@ -124,7 +124,7 @@ int lima_vm_bo_add(struct lima_vm *vm, struct lima_bo *bo, 
bool create)
if (err)
goto err_out1;
 
-   for_each_sg_dma_page(bo->base.sgt->sgl, _iter, bo->base.sgt->nents, 
0) {
+   for_each_sgtable_dma_page(bo->base.sgt, _iter, 0) {
err = lima_vm_map_page(vm, sg_page_iter_dma_address(_iter),
   bo_va->node.start + offset);
if (err)
@@ -298,8 +298,7 @@ int lima_vm_map_bo(struct lima_vm *vm, struct lima_bo *bo, 
int pageoff)
mutex_lock(>lock);
 
base = bo_va->node.start + (pageoff << PAGE_SHIFT);
-   for_each_sg_dma_page(bo->base.sgt->sgl, _iter,
-bo->base.sgt->nents, pageoff) {
+   for_each_sgtable_dma_page(bo->base.sgt, _iter, pageoff) {
err = lima_vm_map_page(vm, sg_page_iter_dma_address(_iter),
   base + offset);
if (err)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 06/30] drm: exynos: use common helper for a scatterlist contiguity check

2020-09-04 Thread Marek Szyprowski
Use common helper for checking the contiguity of the imported dma-buf.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Acked-by : Inki Dae 
---
 drivers/gpu/drm/exynos/exynos_drm_gem.c | 23 +++
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index efa476858db5..1716a023bca0 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -431,27 +431,10 @@ exynos_drm_gem_prime_import_sg_table(struct drm_device 
*dev,
 {
struct exynos_drm_gem *exynos_gem;
 
-   if (sgt->nents < 1)
+   /* check if the entries in the sg_table are contiguous */
+   if (drm_prime_get_contiguous_size(sgt) < attach->dmabuf->size) {
+   DRM_ERROR("buffer chunks must be mapped contiguously");
return ERR_PTR(-EINVAL);
-
-   /*
-* Check if the provided buffer has been mapped as contiguous
-* into DMA address space.
-*/
-   if (sgt->nents > 1) {
-   dma_addr_t next_addr = sg_dma_address(sgt->sgl);
-   struct scatterlist *s;
-   unsigned int i;
-
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
-   if (!sg_dma_len(s))
-   break;
-   if (sg_dma_address(s) != next_addr) {
-   DRM_ERROR("buffer chunks must be mapped 
contiguously");
-   return ERR_PTR(-EINVAL);
-   }
-   next_addr = sg_dma_address(s) + sg_dma_len(s);
-   }
}
 
exynos_gem = exynos_drm_gem_init(dev, attach->dmabuf->size);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 07/30] drm: exynos: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Acked-by : Inki Dae 
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 03be31427181..967a5cdc120e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -395,8 +395,8 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
return;
 
 out:
-   dma_unmap_sg(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt->sgl,
-   g2d_userptr->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
+ DMA_BIDIRECTIONAL, 0);
 
pages = frame_vector_pages(g2d_userptr->vec);
if (!IS_ERR(pages)) {
@@ -511,10 +511,10 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
g2d_data *g2d,
 
g2d_userptr->sgt = sgt;
 
-   if (!dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, sgt->nents,
-   DMA_BIDIRECTIONAL)) {
+   ret = dma_map_sgtable(to_dma_dev(g2d->drm_dev), sgt,
+ DMA_BIDIRECTIONAL, 0);
+   if (ret) {
DRM_DEV_ERROR(g2d->dev, "failed to map sgt with dma region.\n");
-   ret = -ENOMEM;
goto err_sg_free_table;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-04 Thread Marek Szyprowski
Use recently introduced common wrappers operating directly on the struct
sg_table objects and scatterlist page iterators to make the code a bit
more compact, robust, easier to follow and copy/paste safe.

No functional change, because the code already properly did all the
scatterlist related calls.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
 .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
 .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
 3 files changed, 31 insertions(+), 47 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8..1b242d844dde 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
sg_table *sgt)
unsigned int i;
unsigned long size = 0;
 
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
+   for_each_sgtable_dma_sg(sgt, s, i) {
if (sg_dma_address(s) != expected)
break;
-   expected = sg_dma_address(s) + sg_dma_len(s);
+   expected += sg_dma_len(s);
size += sg_dma_len(s);
}
return size;
@@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir);
+   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
+   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
 /*/
@@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
 * memory locations do not require any explicit cache
 * maintenance prior or after being used by the device.
 */
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
kfree(attach);
db_attach->priv = NULL;
@@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
attach->dma_dir = DMA_NONE;
}
 
@@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 * mapping to the client with new direction, no cache sync
 * required see comment in vb2_dc_dmabuf_ops_detach()
 */
-   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
- dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (!sgt->nents) {
+   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {
pr_err("failed to map scatterlist\n");
mutex_unlock(lock);
return ERR_PTR(-EIO);
@@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
 * No need to sync to CPU, it's already synced to the CPU
 * since the finish() memop will have been called before this.
 */
-   dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(buf->dev, sgt, buf->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
pages = frame_vector_pages(buf->vec);
/* sgt should exist only if vector contains pages... */
BUG_ON(IS_ERR(pages));
@@ -553,9 +551,8 @@ static void *vb2_dc_get_userptr(struct device *dev, 
unsigned long vaddr,
 * No need to sync to the device, this will happen later when the
 * prepare() memop is called.
 */
-   sgt->nents = dma_map_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
- buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (sgt->ne

[PATCH v10 26/30] staging: tegra-vde: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Dmitry Osipenko 
---
 drivers/staging/media/tegra-vde/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/tegra-vde/iommu.c 
b/drivers/staging/media/tegra-vde/iommu.c
index 6af863d92123..adf8dc7ee25c 100644
--- a/drivers/staging/media/tegra-vde/iommu.c
+++ b/drivers/staging/media/tegra-vde/iommu.c
@@ -36,8 +36,8 @@ int tegra_vde_iommu_map(struct tegra_vde *vde,
 
addr = iova_dma_addr(>iova, iova);
 
-   size = iommu_map_sg(vde->domain, addr, sgt->sgl, sgt->nents,
-   IOMMU_READ | IOMMU_WRITE);
+   size = iommu_map_sgtable(vde->domain, addr, sgt,
+IOMMU_READ | IOMMU_WRITE);
if (!size) {
__free_iova(>iova, iova);
return -ENXIO;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 20/30] drm: vmwgfx: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Roland Scheidegger 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index c7f10b2c93d2..13c31e2d7254 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -362,8 +362,7 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
 
-   dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
-   DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, _tt->sgt, DMA_BIDIRECTIONAL, 0);
vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
 }
 
@@ -383,16 +382,8 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 static int vmw_ttm_map_for_dma(struct vmw_ttm_tt *vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
-   int ret;
-
-   ret = dma_map_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.orig_nents,
-DMA_BIDIRECTIONAL);
-   if (unlikely(ret == 0))
-   return -ENOMEM;
 
-   vmw_tt->sgt.nents = ret;
-
-   return 0;
+   return dma_map_sgtable(dev, _tt->sgt, DMA_BIDIRECTIONAL, 0);
 }
 
 /**
@@ -449,10 +440,10 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
goto out_sg_alloc_fail;
 
-   if (vsgt->num_pages > vmw_tt->sgt.nents) {
+   if (vsgt->num_pages > vmw_tt->sgt.orig_nents) {
uint64_t over_alloc =
sgl_size * (vsgt->num_pages -
-   vmw_tt->sgt.nents);
+   vmw_tt->sgt.orig_nents);
 
ttm_mem_global_free(glob, over_alloc);
vmw_tt->sg_alloc_size -= over_alloc;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 05/30] drm: etnaviv: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 12 +---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 15 ---
 2 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index f06e19e7be04..eaf1949bc2e4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -27,7 +27,7 @@ static void etnaviv_gem_scatter_map(struct etnaviv_gem_object 
*etnaviv_obj)
 * because display controller, GPU, etc. are not coherent.
 */
if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK)
-   dma_map_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   dma_map_sgtable(dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
 }
 
 static void etnaviv_gem_scatterlist_unmap(struct etnaviv_gem_object 
*etnaviv_obj)
@@ -51,7 +51,7 @@ static void etnaviv_gem_scatterlist_unmap(struct 
etnaviv_gem_object *etnaviv_obj
 * discard those writes.
 */
if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK)
-   dma_unmap_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
 }
 
 /* called with etnaviv_obj->lock held */
@@ -404,9 +404,8 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
}
 
if (etnaviv_obj->flags & ETNA_BO_CACHED) {
-   dma_sync_sg_for_cpu(dev->dev, etnaviv_obj->sgt->sgl,
-   etnaviv_obj->sgt->nents,
-   etnaviv_op_to_dma_dir(op));
+   dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
+etnaviv_op_to_dma_dir(op));
etnaviv_obj->last_cpu_prep_op = op;
}
 
@@ -421,8 +420,7 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
if (etnaviv_obj->flags & ETNA_BO_CACHED) {
/* fini without a prep is almost certainly a userspace error */
WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
-   dma_sync_sg_for_device(dev->dev, etnaviv_obj->sgt->sgl,
-   etnaviv_obj->sgt->nents,
+   dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
etnaviv_obj->last_cpu_prep_op = 0;
}
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index 3607d348c298..15d9fa3879e5 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -73,13 +73,13 @@ static int etnaviv_iommu_map(struct etnaviv_iommu_context 
*context, u32 iova,
 struct sg_table *sgt, unsigned len, int prot)
 {  struct scatterlist *sg;
unsigned int da = iova;
-   unsigned int i, j;
+   unsigned int i;
int ret;
 
if (!context || !sgt)
return -EINVAL;
 
-   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   for_each_sgtable_dma_sg(sgt, sg, i) {
u32 pa = sg_dma_address(sg) - sg->offset;
size_t bytes = sg_dma_len(sg) + sg->offset;
 
@@ -95,14 +95,7 @@ static int etnaviv_iommu_map(struct etnaviv_iommu_context 
*context, u32 iova,
return 0;
 
 fail:
-   da = iova;
-
-   for_each_sg(sgt->sgl, sg, i, j) {
-   size_t bytes = sg_dma_len(sg) + sg->offset;
-
-   etnaviv_context_unmap(context, da, bytes);
-   da += bytes;
-   }
+   etnaviv_context_unmap(context, iova, da - iova);
return ret;
 }
 
@@ -1

[PATCH v10 19/30] drm: virtio: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_object.c | 36 ++---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 12 -
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
b/drivers/gpu/drm/virtio/virtgpu_object.c
index 842f8b61aa89..00d6b95e259d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -72,9 +72,8 @@ void virtio_gpu_cleanup_object(struct virtio_gpu_object *bo)
 
if (shmem->pages) {
if (shmem->mapped) {
-   dma_unmap_sg(vgdev->vdev->dev.parent,
-shmem->pages->sgl, shmem->mapped,
-DMA_TO_DEVICE);
+   dma_unmap_sgtable(vgdev->vdev->dev.parent,
+shmem->pages, DMA_TO_DEVICE, 0);
shmem->mapped = 0;
}
 
@@ -164,13 +163,13 @@ static int virtio_gpu_object_shmem_init(struct 
virtio_gpu_device *vgdev,
}
 
if (use_dma_api) {
-   shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent,
-  shmem->pages->sgl,
-  shmem->pages->nents,
-  DMA_TO_DEVICE);
-   *nents = shmem->mapped;
+   ret = dma_map_sgtable(vgdev->vdev->dev.parent,
+ shmem->pages, DMA_TO_DEVICE, 0);
+   if (ret)
+   return ret;
+   *nents = shmem->mapped = shmem->pages->nents;
} else {
-   *nents = shmem->pages->nents;
+   *nents = shmem->pages->orig_nents;
}
 
*ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry),
@@ -180,13 +179,20 @@ static int virtio_gpu_object_shmem_init(struct 
virtio_gpu_device *vgdev,
return -ENOMEM;
}
 
-   for_each_sg(shmem->pages->sgl, sg, *nents, si) {
-   (*ents)[si].addr = cpu_to_le64(use_dma_api
-  ? sg_dma_address(sg)
-  : sg_phys(sg));
-   (*ents)[si].length = cpu_to_le32(sg->length);
-   (*ents)[si].padding = 0;
+   if (use_dma_api) {
+   for_each_sgtable_dma_sg(shmem->pages, sg, si) {
+   (*ents)[si].addr = cpu_to_le64(sg_dma_address(sg));
+   (*ents)[si].length = cpu_to_le32(sg_dma_len(sg));
+   (*ents)[si].padding = 0;
+   }
+   } else {
+   for_each_sgtable_sg(shmem->pages, sg, si) {
+   (*ents)[si].addr = cpu_to_le64(sg_phys(sg));
+   (*ents)[si].length = cpu_to_le32(sg->length);
+   (*ents)[si].padding = 0;
+   }
}
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index c93c2db35aaf..651d1b0e8e8d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -302,7 +302,7 @@ static struct sg_table *vmalloc_to_sgt(char *data, uint32_t 
size, int *sg_ents)
return NULL;
}
 
-   for_each_sg(sgt->sgl, sg, *sg_ents, i) {
+   for_each_sgtable_sg(sgt, sg, i) {
pg = vmalloc_to_page(data);
if (!pg) {
sg_free_table(sgt);
@@ -603,9 +603,8 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct 
virtio_gpu_devic

[PATCH v10 18/30] drm: v3d: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Eric Anholt 
---
 drivers/gpu/drm/v3d/v3d_mmu.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 3b81ea28c0bb..5a453532901f 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -90,18 +90,17 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   unsigned int count;
-   struct scatterlist *sgl;
+   struct sg_dma_page_iter dma_iter;
 
-   for_each_sg(shmem_obj->sgt->sgl, sgl, shmem_obj->sgt->nents, count) {
-   u32 page_address = sg_dma_address(sgl) >> V3D_MMU_PAGE_SHIFT;
+   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
+   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
+   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
u32 pte = page_prot | page_address;
u32 i;
 
-   BUG_ON(page_address + (sg_dma_len(sgl) >> V3D_MMU_PAGE_SHIFT) >=
+   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
   BIT(24));
-
-   for (i = 0; i < sg_dma_len(sgl) >> V3D_MMU_PAGE_SHIFT; i++)
+   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
v3d->pt[page++] = pte + i;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 10/30] drm: mediatek: use common helper for a scatterlist contiguity check

2020-09-04 Thread Marek Szyprowski
Use common helper for checking the contiguity of the imported dma-buf and
do this check before allocating resources, so the error path is simpler.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
Acked-by: Chun-Kuang Hu 
---
 drivers/gpu/drm/mediatek/mtk_drm_gem.c | 28 ++
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 6190cc3b7b0d..3654ec732029 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -212,37 +212,21 @@ struct drm_gem_object 
*mtk_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach, struct sg_table *sg)
 {
struct mtk_drm_gem_obj *mtk_gem;
-   int ret;
-   struct scatterlist *s;
-   unsigned int i;
-   dma_addr_t expected;
 
-   mtk_gem = mtk_drm_gem_init(dev, attach->dmabuf->size);
+   /* check if the entries in the sg_table are contiguous */
+   if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) {
+   DRM_ERROR("sg_table is not contiguous");
+   return ERR_PTR(-EINVAL);
+   }
 
+   mtk_gem = mtk_drm_gem_init(dev, attach->dmabuf->size);
if (IS_ERR(mtk_gem))
return ERR_CAST(mtk_gem);
 
-   expected = sg_dma_address(sg->sgl);
-   for_each_sg(sg->sgl, s, sg->nents, i) {
-   if (!sg_dma_len(s))
-   break;
-
-   if (sg_dma_address(s) != expected) {
-   DRM_ERROR("sg_table is not contiguous");
-   ret = -EINVAL;
-   goto err_gem_free;
-   }
-   expected = sg_dma_address(s) + sg_dma_len(s);
-   }
-
mtk_gem->dma_addr = sg_dma_address(sg->sgl);
mtk_gem->sg = sg;
 
return _gem->base;
-
-err_gem_free:
-   kfree(mtk_gem);
-   return ERR_PTR(ret);
 }
 
 void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 12/30] drm: msm: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c| 13 +
 drivers/gpu/drm/msm/msm_gpummu.c | 15 +++
 drivers/gpu/drm/msm/msm_iommu.c  |  2 +-
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index b2f49152b4d4..8c7ae812b813 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -53,11 +53,10 @@ static void sync_for_device(struct msm_gem_object *msm_obj)
struct device *dev = msm_obj->base.dev->dev;
 
if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
-   dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_sync_sgtable_for_device(dev, msm_obj->sgt,
+   DMA_BIDIRECTIONAL);
} else {
-   dma_map_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_map_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0);
}
 }
 
@@ -66,11 +65,9 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj)
struct device *dev = msm_obj->base.dev->dev;
 
if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
-   dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_sync_sgtable_for_cpu(dev, msm_obj->sgt, DMA_BIDIRECTIONAL);
} else {
-   dma_unmap_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, msm_obj->sgt, DMA_BIDIRECTIONAL, 0);
}
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..53a7348476a1 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -30,21 +30,20 @@ static int msm_gpummu_map(struct msm_mmu *mmu, uint64_t 
iova,
 {
struct msm_gpummu *gpummu = to_msm_gpummu(mmu);
unsigned idx = (iova - GPUMMU_VA_START) / GPUMMU_PAGE_SIZE;
-   struct scatterlist *sg;
+   struct sg_dma_page_iter dma_iter;
unsigned prot_bits = 0;
-   unsigned i, j;
 
if (prot & IOMMU_WRITE)
prot_bits |= 1;
if (prot & IOMMU_READ)
prot_bits |= 2;
 
-   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
-   dma_addr_t addr = sg->dma_address;
-   for (j = 0; j < sg->length / GPUMMU_PAGE_SIZE; j++, idx++) {
-   gpummu->table[idx] = addr | prot_bits;
-   addr += GPUMMU_PAGE_SIZE;
-   }
+   for_each_sgtable_dma_page(sgt, _iter, 0) {
+   dma_addr_t addr = sg_page_iter_dma_address(_iter);
+   int i;
+
+   for (i = 0; i < PAGE_SIZE; i += GPUMMU_PAGE_SIZE)
+   gpummu->table[idx++] = (addr + i) | prot_bits;
}
 
/* we can improve by deferring flush for multiple map() */
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..6c31e65834c6 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,7 +36,7 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
-   ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
+   ret = iommu_map_sgtable(iommu->domain, iova, sgt, prot);
WARN_ON(!ret);
 
return (ret == len) ? 0 : -EINVAL;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 15/30] drm: rockchip: use common helper for a scatterlist contiguity check

2020-09-04 Thread Marek Szyprowski
Use common helper for checking the contiguity of the imported dma-buf.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b9275ba7c5a5..2970e534e2bb 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -460,23 +460,6 @@ struct sg_table *rockchip_gem_prime_get_sg_table(struct 
drm_gem_object *obj)
return sgt;
 }
 
-static unsigned long rockchip_sg_get_contiguous_size(struct sg_table *sgt,
-int count)
-{
-   struct scatterlist *s;
-   dma_addr_t expected = sg_dma_address(sgt->sgl);
-   unsigned int i;
-   unsigned long size = 0;
-
-   for_each_sg(sgt->sgl, s, count, i) {
-   if (sg_dma_address(s) != expected)
-   break;
-   expected = sg_dma_address(s) + sg_dma_len(s);
-   size += sg_dma_len(s);
-   }
-   return size;
-}
-
 static int
 rockchip_gem_iommu_map_sg(struct drm_device *drm,
  struct dma_buf_attachment *attach,
@@ -498,7 +481,7 @@ rockchip_gem_dma_map_sg(struct drm_device *drm,
if (!count)
return -EINVAL;
 
-   if (rockchip_sg_get_contiguous_size(sg, count) < attach->dmabuf->size) {
+   if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) {
DRM_ERROR("failed to map sg_table to contiguous linear 
address.\n");
dma_unmap_sg(drm->dev, sg->sgl, sg->nents,
 DMA_BIDIRECTIONAL);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 22/30] xen: gntdev: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Juergen Gross 
---
 drivers/xen/gntdev-dmabuf.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index b1b6eebafd5d..4c13cbc99896 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -247,10 +247,9 @@ static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
 
if (sgt) {
if (gntdev_dmabuf_attach->dir != DMA_NONE)
-   dma_unmap_sg_attrs(attach->dev, sgt->sgl,
-  sgt->nents,
-  gntdev_dmabuf_attach->dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(attach->dev, sgt,
+ gntdev_dmabuf_attach->dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
}
 
@@ -288,8 +287,8 @@ dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment 
*attach,
sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
  gntdev_dmabuf->nr_pages);
if (!IS_ERR(sgt)) {
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   if (dma_map_sgtable(attach->dev, sgt, dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {
sg_free_table(sgt);
kfree(sgt);
sgt = ERR_PTR(-ENOMEM);
@@ -633,7 +632,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
 
/* Now convert sgt to array of pages and check for page validity. */
i = 0;
-   for_each_sg_page(sgt->sgl, _iter, sgt->nents, 0) {
+   for_each_sgtable_page(sgt, _iter, 0) {
struct page *page = sg_page_iter_page(_iter);
/*
 * Check if page is valid: this can happen if we are given
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 29/30] media: pci: fix common ALSA DMA-mapping related codes

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

While touching this code, update it to use the modern DMA_FROM_DEVICE
definitions.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/pci/cx23885/cx23885-alsa.c | 4 ++--
 drivers/media/pci/cx25821/cx25821-alsa.c | 4 ++--
 drivers/media/pci/cx88/cx88-alsa.c   | 6 +++---
 drivers/media/pci/saa7134/saa7134-alsa.c | 4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/media/pci/cx23885/cx23885-alsa.c 
b/drivers/media/pci/cx23885/cx23885-alsa.c
index df44ed7393a0..c797bff6eebb 100644
--- a/drivers/media/pci/cx23885/cx23885-alsa.c
+++ b/drivers/media/pci/cx23885/cx23885-alsa.c
@@ -113,7 +113,7 @@ static int cx23885_alsa_dma_map(struct cx23885_audio_dev 
*dev)
struct cx23885_audio_buffer *buf = dev->buf;
 
buf->sglen = dma_map_sg(>pci->dev, buf->sglist,
-   buf->nr_pages, PCI_DMA_FROMDEVICE);
+   buf->nr_pages, DMA_FROM_DEVICE);
 
if (0 == buf->sglen) {
pr_warn("%s: cx23885_alsa_map_sg failed\n", __func__);
@@ -129,7 +129,7 @@ static int cx23885_alsa_dma_unmap(struct cx23885_audio_dev 
*dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages, 
DMA_FROM_DEVICE);
buf->sglen = 0;
return 0;
 }
diff --git a/drivers/media/pci/cx25821/cx25821-alsa.c 
b/drivers/media/pci/cx25821/cx25821-alsa.c
index 301616426d8a..8da31c953b02 100644
--- a/drivers/media/pci/cx25821/cx25821-alsa.c
+++ b/drivers/media/pci/cx25821/cx25821-alsa.c
@@ -177,7 +177,7 @@ static int cx25821_alsa_dma_map(struct cx25821_audio_dev 
*dev)
struct cx25821_audio_buffer *buf = dev->buf;
 
buf->sglen = dma_map_sg(>pci->dev, buf->sglist,
-   buf->nr_pages, PCI_DMA_FROMDEVICE);
+   buf->nr_pages, DMA_FROM_DEVICE);
 
if (0 == buf->sglen) {
pr_warn("%s: cx25821_alsa_map_sg failed\n", __func__);
@@ -193,7 +193,7 @@ static int cx25821_alsa_dma_unmap(struct cx25821_audio_dev 
*dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages, 
DMA_FROM_DEVICE);
buf->sglen = 0;
return 0;
 }
diff --git a/drivers/media/pci/cx88/cx88-alsa.c 
b/drivers/media/pci/cx88/cx88-alsa.c
index 7d7aceecc985..d38633bc1330 100644
--- a/drivers/media/pci/cx88/cx88-alsa.c
+++ b/drivers/media/pci/cx88/cx88-alsa.c
@@ -316,7 +316,7 @@ static int cx88_alsa_dma_map(struct cx88_audio_dev *dev)
struct cx88_audio_buffer *buf = dev->buf;
 
buf->sglen = dma_map_sg(>pci->dev, buf->sglist,
-   buf->nr_pages, PCI_DMA_FROMDEVICE);
+   buf->nr_pages, DMA_FROM_DEVICE);
 
if (buf->sglen == 0) {
pr_warn("%s: cx88_alsa_map_sg failed\n", __func__);
@@ -332,8 +332,8 @@ static int cx88_alsa_dma_unmap(struct cx88_audio_dev *dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen,
-PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages,
+DMA_FROM_DEVICE);
buf->sglen = 0;
return 0;
 }
diff --git a/drivers/media/pci/saa7134/saa7134-alsa.c 
b/drivers/media/pci/saa7134/saa7134-alsa.c
index 544ca57eee75..707ca77221dc 100644
--- a/drivers/media/pci/saa7134/saa7134-alsa.c
+++ b/drivers/media/pci/saa7134/saa7134-alsa.c
@@ -297,7 +297,7 @@ static int saa7134_alsa_dma_map(struct saa7134_dev *dev)
struct saa7134_dmasound *dma = >dmasound;
 
dma->sglen = dma_map_sg(>pci->dev, dma->sglist,
-   dma->nr_pages, PCI_DMA_FROMDEVICE);
+   dma->nr_pages, DMA_FROM_DEVICE);
 
if (0 == dma->sglen) {
pr_warn("%s: saa7134_alsa_map_sg failed\n", __func__);
@@ -313,7 +313,7 @@ static int saa7134_alsa_dma_unmap(struct saa7134_dev *dev)
if (!dma->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, dma->sglist, dma->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, dma->sglist, dma->nr_pages, 
DMA_FROM_DEVICE);
dma->sglen = 0;
return 0;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 25/30] dmabuf: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Gerd Hoffmann 
---
 drivers/dma-buf/heaps/heap-helpers.c | 13 ++---
 drivers/dma-buf/udmabuf.c|  7 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/dma-buf/heaps/heap-helpers.c 
b/drivers/dma-buf/heaps/heap-helpers.c
index 9f964ca3f59c..d0696cf937af 100644
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ b/drivers/dma-buf/heaps/heap-helpers.c
@@ -140,13 +140,12 @@ struct sg_table *dma_heap_map_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heaps_attachment *a = attachment->priv;
-   struct sg_table *table;
-
-   table = >table;
+   struct sg_table *table = >table;
+   int ret;
 
-   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-   direction))
-   table = ERR_PTR(-ENOMEM);
+   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (ret)
+   table = ERR_PTR(ret);
return table;
 }
 
@@ -154,7 +153,7 @@ static void dma_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
   struct sg_table *table,
   enum dma_data_direction direction)
 {
-   dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
+   dma_unmap_sgtable(attachment->dev, table, direction, 0);
 }
 
 static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index acb26c627d27..89e293bd9252 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -63,10 +63,9 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
GFP_KERNEL);
if (ret < 0)
goto err;
-   if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) {
-   ret = -EINVAL;
+   ret = dma_map_sgtable(dev, sg, direction, 0);
+   if (ret < 0)
goto err;
-   }
return sg;
 
 err:
@@ -78,7 +77,7 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
 static void put_sg_table(struct device *dev, struct sg_table *sg,
 enum dma_data_direction direction)
 {
-   dma_unmap_sg(dev, sg->sgl, sg->nents, direction);
+   dma_unmap_sgtable(dev, sg, direction, 0);
sg_free_table(sg);
kfree(sg);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 03/30] drm: core: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/drm_cache.c|  2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +-
 drivers/gpu/drm/drm_prime.c| 11 ++-
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 03e01b000f7a..0fe3c496002a 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -127,7 +127,7 @@ drm_clflush_sg(struct sg_table *st)
struct sg_page_iter sg_iter;
 
mb(); /*CLFLUSH is ordered only by using memory barriers*/
-   for_each_sg_page(st->sgl, _iter, st->nents, 0)
+   for_each_sgtable_page(st, _iter, 0)
drm_clflush_page(sg_page_iter_page(_iter));
mb(); /*Make sure that all cache line entry is flushed*/
 
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 4b7cfbac4daa..47d8211221f2 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -126,8 +126,8 @@ void drm_gem_shmem_free_object(struct drm_gem_object *obj)
drm_prime_gem_destroy(obj, shmem->sgt);
} else {
if (shmem->sgt) {
-   dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(obj->dev->dev, shmem->sgt,
+ DMA_BIDIRECTIONAL, 0);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
}
@@ -424,8 +424,7 @@ void drm_gem_shmem_purge_locked(struct drm_gem_object *obj)
 
WARN_ON(!drm_gem_shmem_is_purgeable(shmem));
 
-   dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(obj->dev->dev, shmem->sgt, DMA_BIDIRECTIONAL, 0);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
shmem->sgt = NULL;
@@ -697,12 +696,17 @@ struct sg_table *drm_gem_shmem_get_pages_sgt(struct 
drm_gem_object *obj)
goto err_put_pages;
}
/* Map the pages for use by the h/w. */
-   dma_map_sg(obj->dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   ret = dma_map_sgtable(obj->dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
+   if (ret)
+   goto err_free_sgt;
 
shmem->sgt = sgt;
 
return sgt;
 
+err_free_sgt:
+   sg_free_table(sgt);
+   kfree(sgt);
 err_put_pages:
drm_gem_shmem_put_pages(shmem);
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index c5e796d4a489..b8c7f068a5a4 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -617,6 +617,7 @@ struct sg_table *drm_gem_map_dma_buf(struct 
dma_buf_attachment *attach,
 {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct sg_table *sgt;
+   int ret;
 
if (WARN_ON(dir == DMA_NONE))
return ERR_PTR(-EINVAL);
@@ -626,11 +627,12 @@ struct sg_table *drm_gem_map_dma_buf(struct 
dma_buf_attachment *attach,
else
sgt = obj->dev->driver->gem_prime_get_sg_table(obj);
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   ret = dma_map_sgtable(attach->dev, sgt, dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
sg_free_table(sgt);
kfree(sgt);
-  

[PATCH v10 28/30] samples: vfio-mdev/mbochs: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

While touching this code, also add missing call to dma_unmap_sgtable.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 samples/vfio-mdev/mbochs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index 3cc5e5921682..e03068917273 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -846,7 +846,7 @@ static struct sg_table *mbochs_map_dmabuf(struct 
dma_buf_attachment *at,
if (sg_alloc_table_from_pages(sg, dmabuf->pages, dmabuf->pagecount,
  0, dmabuf->mode.size, GFP_KERNEL) < 0)
goto err2;
-   if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction))
+   if (dma_map_sgtable(at->dev, sg, direction, 0))
goto err3;
 
return sg;
@@ -868,6 +868,7 @@ static void mbochs_unmap_dmabuf(struct dma_buf_attachment 
*at,
 
dev_dbg(dev, "%s: %d\n", __func__, dmabuf->id);
 
+   dma_unmap_sgtable(at->dev, sg, direction, 0);
sg_free_table(sg);
kfree(sg);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 27/30] rapidio: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/rapidio/devices/rio_mport_cdev.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/rapidio/devices/rio_mport_cdev.c 
b/drivers/rapidio/devices/rio_mport_cdev.c
index a30342942e26..89eb3d212652 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -573,8 +573,7 @@ static void dma_req_free(struct kref *ref)
refcount);
struct mport_cdev_priv *priv = req->priv;
 
-   dma_unmap_sg(req->dmach->device->dev,
-req->sgt.sgl, req->sgt.nents, req->dir);
+   dma_unmap_sgtable(req->dmach->device->dev, >sgt, req->dir, 0);
sg_free_table(>sgt);
if (req->page_list) {
unpin_user_pages(req->page_list, req->nr_pages);
@@ -814,7 +813,6 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
struct mport_dev *md = priv->md;
struct dma_chan *chan;
int ret;
-   int nents;
 
if (xfer->length == 0)
return -EINVAL;
@@ -930,15 +928,14 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
xfer->offset, xfer->length);
}
 
-   nents = dma_map_sg(chan->device->dev,
-  req->sgt.sgl, req->sgt.nents, dir);
-   if (nents == 0) {
+   ret = dma_map_sgtable(chan->device->dev, >sgt, dir, 0);
+   if (ret) {
rmcd_error("Failed to map SG list");
ret = -EFAULT;
goto err_pg;
}
 
-   ret = do_dma_request(req, xfer, sync, nents);
+   ret = do_dma_request(req, xfer, sync, req->sgt.nents);
 
if (ret >= 0) {
if (sync == RIO_TRANSFER_ASYNC)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 00/30] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-09-04 Thread Marek Szyprowski
Dear All,

During the Exynos DRM GEM rework and fixing the issues in the.
drm_prime_sg_to_page_addr_arrays() function [1] I've noticed that most
drivers in DRM framework incorrectly use nents and orig_nents entries of
the struct sg_table.

In case of the most DMA-mapping implementations exchanging those two
entries or using nents for all loops on the scatterlist is harmless,
because they both have the same value. There exists however a DMA-mapping
implementations, for which such incorrect usage breaks things. The nents
returned by dma_map_sg() might be lower than the nents passed as its
parameter and this is perfectly fine. DMA framework or IOMMU is allowed
to join consecutive chunks while mapping if such operation is supported
by the underlying HW (bus, bridge, IOMMU, etc). Example of the case
where dma_map_sg() might return 1 'DMA' chunk for the 4 'physical' pages
is described here [2]

The DMA-mapping framework documentation [3] states that dma_map_sg()
returns the numer of the created entries in the DMA address space.
However the subsequent calls to dma_sync_sg_for_{device,cpu} and
dma_unmap_sg must be called with the original number of entries passed to
dma_map_sg. The common pattern in DRM drivers were to assign the
dma_map_sg() return value to sg_table->nents and use that value for
the subsequent calls to dma_sync_sg_* or dma_unmap_sg functions. Also
the code iterated over nents times to access the pages stored in the
processed scatterlist, while it should use orig_nents as the numer of
the page entries.

I've tried to identify all such incorrect usage of sg_table->nents and
this is a result of my research. It looks that the incorrect pattern has
been copied over the many drivers mainly in the DRM subsystem. Too bad in
most cases it even worked correctly if the system used a simple, linear
DMA-mapping implementation, for which swapping nents and orig_nents
doesn't make any difference. To avoid similar issues in the future, I've
introduced a common wrappers for DMA-mapping calls, which operate directly
on the sg_table objects. I've also added wrappers for iterating over the
scatterlists stored in the sg_table objects and applied them where
possible. This, together with some common DRM prime helpers, allowed me
to almost get rid of all nents/orig_nents usage in the drivers. I hope
that such change makes the code robust, easier to follow and copy/paste
safe.

The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
it fully. The driver creatively uses sg_table->orig_nents to store the
size of the allocate scatterlist and ignores the number of the entries
returned by dma_map_sg function. In this patchset I only fixed the
sg_table objects exported by dmabuf related functions. I hope that I
didn't break anything there.

Patches are based on top of Linux next-20200903. The required changes to
DMA-mapping framework has been already merged to v5.9-rc3.

If possible I would like ask for merging most of the patches via DRM
tree.

Best regards,
Marek Szyprowski


References:

[1] https://lkml.org/lkml/2020/3/27/555
[2] https://lkml.org/lkml/2020/3/29/65
[3] Documentation/DMA-API-HOWTO.txt
[4] 
https://lore.kernel.org/linux-iommu/20200512121931.gd20...@lst.de/T/#ma18c958a48c3b241d5409517fa7d192eef87459b

Changelog:

v10:
- addressed more issues pointed by Robin Murphy in his review:
 * prime: restored WARN_ON() in drm_prime_sg_to_page_addr_arrays()
 * armada: simplified cleanup path
 * msm: fixed arm64 build
 * omapdrm: removed WARN_ON(), which is now in 
drm_prime_sg_to_page_addr_arrays()
 * omapdrm: dropped the incorrect nents/orig_nents patch
 * media: pci: also update to use modern DMA_FROM_DEVICE definitions
- dropped merged patches

v9: 
https://lore.kernel.org/linux-iommu/20200826063316.23486-1-m.szyprow...@samsung.com/T/
- rebased onto Linux next-20200825, which is based on v5.9-rc2; fixed conflicts
- dropped merged patches

v8:
- rapidio: fixed issues pointed by kbuilt test robot (use of uninitialized
variable
- vb2: rebased after recent changes in the code

v7: 
https://lore.kernel.org/linux-iommu/20200619103636.11974-1-m.szyprow...@samsung.com/
- changed DMA page interators to standard DMA SG iterators in drm/prime and
  videobuf2-dma-contig as suggested by Robin Murphy
- fixed build issues

v6: 
https://lore.kernel.org/linux-iommu/20200618153956.29558-1-m.szyprow...@samsung.com/T/
- rebased onto Linux next-20200618, which is based on v5.8-rc1; fixed conflicts

v5: 
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/
- fixed some minor style issues and typos
- fixed lack of the attrs argument in ion, dmabuf, rapidio, fastrpc and
  vfio patches

v4: https://lore.kernel.org/linux-iommu/20200512121931.gd20...@lst.de/T/
- added for_each_sgtable_* wrappers and applied where possible
- added drm_prime_get_contiguous_size() and applied where possible
- applied drm_prime_sg_to_page_addr_arrays() where possible to remove page
  extraction from

[PATCH v10 13/30] drm: omapdrm: use common helper for extracting pages array

2020-09-04 Thread Marek Szyprowski
Use common helper for converting a sg_table object into struct
page pointer array.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/omapdrm/omap_gem.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
b/drivers/gpu/drm/omapdrm/omap_gem.c
index d0d12d5dd76c..f67f223c6479 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -1297,10 +1297,9 @@ struct drm_gem_object *omap_gem_new_dmabuf(struct 
drm_device *dev, size_t size,
omap_obj->dma_addr = sg_dma_address(sgt->sgl);
} else {
/* Create pages list from sgt */
-   struct sg_page_iter iter;
struct page **pages;
unsigned int npages;
-   unsigned int i = 0;
+   unsigned int ret;
 
npages = DIV_ROUND_UP(size, PAGE_SIZE);
pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
@@ -1311,14 +1310,9 @@ struct drm_gem_object *omap_gem_new_dmabuf(struct 
drm_device *dev, size_t size,
}
 
omap_obj->pages = pages;
-
-   for_each_sg_page(sgt->sgl, , sgt->orig_nents, 0) {
-   pages[i++] = sg_page_iter_page();
-   if (i > npages)
-   break;
-   }
-
-   if (WARN_ON(i != npages)) {
+   ret = drm_prime_sg_to_page_addr_arrays(sgt, pages, NULL,
+  npages);
+   if (ret) {
omap_gem_free_object(obj);
obj = ERR_PTR(-ENOMEM);
goto done;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 17/30] drm: tegra: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/tegra/gem.c   | 27 ++-
 drivers/gpu/drm/tegra/plane.c | 15 +--
 2 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 723df142a981..01d94befab11 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -98,8 +98,8 @@ static struct sg_table *tegra_bo_pin(struct device *dev, 
struct host1x_bo *bo,
 * the SG table needs to be copied to avoid overwriting any
 * other potential users of the original SG table.
 */
-   err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl, 
obj->sgt->nents,
-GFP_KERNEL);
+   err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl,
+obj->sgt->orig_nents, GFP_KERNEL);
if (err < 0)
goto free;
} else {
@@ -196,8 +196,7 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, 
struct tegra_bo *bo)
 
bo->iova = bo->mm->start;
 
-   bo->size = iommu_map_sg(tegra->domain, bo->iova, bo->sgt->sgl,
-   bo->sgt->nents, prot);
+   bo->size = iommu_map_sgtable(tegra->domain, bo->iova, bo->sgt, prot);
if (!bo->size) {
dev_err(tegra->drm->dev, "failed to map buffer\n");
err = -ENOMEM;
@@ -264,8 +263,7 @@ static struct tegra_bo *tegra_bo_alloc_object(struct 
drm_device *drm,
 static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo)
 {
if (bo->pages) {
-   dma_unmap_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-DMA_FROM_DEVICE);
+   dma_unmap_sgtable(drm->dev, bo->sgt, DMA_FROM_DEVICE, 0);
drm_gem_put_pages(>gem, bo->pages, true, true);
sg_free_table(bo->sgt);
kfree(bo->sgt);
@@ -290,12 +288,9 @@ static int tegra_bo_get_pages(struct drm_device *drm, 
struct tegra_bo *bo)
goto put_pages;
}
 
-   err = dma_map_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-DMA_FROM_DEVICE);
-   if (err == 0) {
-   err = -EFAULT;
+   err = dma_map_sgtable(drm->dev, bo->sgt, DMA_FROM_DEVICE, 0);
+   if (err)
goto free_sgt;
-   }
 
return 0;
 
@@ -571,7 +566,7 @@ tegra_gem_prime_map_dma_buf(struct dma_buf_attachment 
*attach,
goto free;
}
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
+   if (dma_map_sgtable(attach->dev, sgt, dir, 0))
goto free;
 
return sgt;
@@ -590,7 +585,7 @@ static void tegra_gem_prime_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct tegra_bo *bo = to_tegra_bo(gem);
 
if (bo->pages)
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sgtable(attach->dev, sgt, dir, 0);
 
sg_free_table(sgt);
kfree(sgt);
@@ -609,8 +604,7 @@ static int tegra_gem_prime_begin_cpu_access(struct dma_buf 
*buf,
struct drm_device *drm = gem->dev;
 
if (bo->pages)
-   dma_sync_sg_for_cpu(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-   DMA_FROM_DEVICE);
+   dma_sync_sgtable_for_cpu(drm->dev, bo->sgt, DMA_FROM_DEVICE);
 
return 0;
 }
@@ -623,8 +617,7 @@ static int tegra_gem_prime_end_cpu_access(struct dma_buf 
*buf,

[PATCH v10 08/30] drm: i915: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

This driver creatively uses sg_table->orig_nents to store the size of the
allocated scatterlist and ignores the number of the entries returned by
dma_map_sg function. The sg_table->orig_nents is (mis)used to properly
free the (over)allocated scatterlist.

This patch only introduces the common DMA-mapping wrappers operating
directly on the struct sg_table objects to the dmabuf related functions,
so the other drivers, which might share buffers with i915 could rely on
the properly set nents and orig_nents values.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
Reviewed-by: Michael J. Ruhl 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 11 +++
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c |  7 +++
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 2679380159fc..8a988592715b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -48,12 +48,9 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
src = sg_next(src);
}
 
-   if (!dma_map_sg_attrs(attachment->dev,
- st->sgl, st->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
-   ret = -ENOMEM;
+   ret = dma_map_sgtable(attachment->dev, st, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret)
goto err_free_sg;
-   }
 
return st;
 
@@ -73,9 +70,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
 
-   dma_unmap_sg_attrs(attachment->dev,
-  sg->sgl, sg->nents, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index debaf7b18ab5..be30b27e2926 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -28,10 +28,9 @@ static struct sg_table *mock_map_dma_buf(struct 
dma_buf_attachment *attachment,
sg = sg_next(sg);
}
 
-   if (!dma_map_sg(attachment->dev, st->sgl, st->nents, dir)) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(attachment->dev, st, dir, 0);
+   if (err)
goto err_st;
-   }
 
return st;
 
@@ -46,7 +45,7 @@ static void mock_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *st,
   enum dma_data_direction dir)
 {
-   dma_unmap_sg(attachment->dev, st->sgl, st->nents, dir);
+   dma_unmap_sgtable(attachment->dev, st, dir, 0);
sg_free_table(st);
kfree(st);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 24/30] drm: rcar-du: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

dma_map_sgtable() function returns zero or an error code, so adjust the
return value check for the vsp1_du_map_sg() function.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Laurent Pinchart 
---
 drivers/gpu/drm/rcar-du/rcar_du_vsp.c  | 3 +--
 drivers/media/platform/vsp1/vsp1_drm.c | 8 
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c 
b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
index f1a81c9b184d..a27bff999649 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
@@ -197,9 +197,8 @@ int rcar_du_vsp_map_fb(struct rcar_du_vsp *vsp, struct 
drm_framebuffer *fb,
goto fail;
 
ret = vsp1_du_map_sg(vsp->vsp, sgt);
-   if (!ret) {
+   if (ret) {
sg_free_table(sgt);
-   ret = -ENOMEM;
goto fail;
}
}
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c 
b/drivers/media/platform/vsp1/vsp1_drm.c
index a4a45d68a6ef..86d5e3f4b1ff 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -912,8 +912,8 @@ int vsp1_du_map_sg(struct device *dev, struct sg_table *sgt)
 * skip cache sync. This will need to be revisited when support for
 * non-coherent buffers will be added to the DU driver.
 */
-   return dma_map_sg_attrs(vsp1->bus_master, sgt->sgl, sgt->nents,
-   DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
+   return dma_map_sgtable(vsp1->bus_master, sgt, DMA_TO_DEVICE,
+  DMA_ATTR_SKIP_CPU_SYNC);
 }
 EXPORT_SYMBOL_GPL(vsp1_du_map_sg);
 
@@ -921,8 +921,8 @@ void vsp1_du_unmap_sg(struct device *dev, struct sg_table 
*sgt)
 {
struct vsp1_device *vsp1 = dev_get_drvdata(dev);
 
-   dma_unmap_sg_attrs(vsp1->bus_master, sgt->sgl, sgt->nents,
-  DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(vsp1->bus_master, sgt, DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
 }
 EXPORT_SYMBOL_GPL(vsp1_du_unmap_sg);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 23/30] drm: host1x: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/host1x/job.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 89b6c14b7392..82d0a60ba3f7 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -170,11 +170,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto unpin;
}
 
-   err = dma_map_sg(dev, sgt->sgl, sgt->nents, dir);
-   if (!err) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(dev, sgt, dir, 0);
+   if (err)
goto unpin;
-   }
 
job->unpins[job->num_unpins].dev = dev;
job->unpins[job->num_unpins].dir = dir;
@@ -228,7 +226,7 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
}
 
if (host->domain) {
-   for_each_sg(sgt->sgl, sg, sgt->nents, j)
+   for_each_sgtable_sg(sgt, sg, j)
gather_size += sg->length;
gather_size = iova_align(>iova, gather_size);
 
@@ -240,9 +238,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto put;
}
 
-   err = iommu_map_sg(host->domain,
+   err = iommu_map_sgtable(host->domain,
iova_dma_addr(>iova, alloc),
-   sgt->sgl, sgt->nents, IOMMU_READ);
+   sgt, IOMMU_READ);
if (err == 0) {
__free_iova(>iova, alloc);
err = -EINVAL;
@@ -252,12 +250,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
job->unpins[job->num_unpins].size = gather_size;
phys_addr = iova_dma_addr(>iova, alloc);
} else if (sgt) {
-   err = dma_map_sg(host->dev, sgt->sgl, sgt->nents,
-DMA_TO_DEVICE);
-   if (!err) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(host->dev, sgt, DMA_TO_DEVICE, 0);
+   if (err)
goto put;
-   }
 
job->unpins[job->num_unpins].dir = DMA_TO_DEVICE;
job->unpins[job->num_unpins].dev = host->dev;
@@ -660,8 +655,7 @@ void host1x_job_unpin(struct host1x_job *job)
}
 
if (unpin->dev && sgt)
-   dma_unmap_sg(unpin->dev, sgt->sgl, sgt->nents,
-unpin->dir);
+   dma_unmap_sgtable(unpin->dev, sgt, unpin->dir, 0);
 
host1x_bo_unpin(dev, unpin->bo, sgt);
host1x_bo_put(unpin->bo);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 16/30] drm: rockchip: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 23 +
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 2970e534e2bb..cb50f2ba2e46 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -36,8 +36,8 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object 
*rk_obj)
 
rk_obj->dma_addr = rk_obj->mm.start;
 
-   ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
-  rk_obj->sgt->nents, prot);
+   ret = iommu_map_sgtable(private->domain, rk_obj->dma_addr, rk_obj->sgt,
+   prot);
if (ret < rk_obj->base.size) {
DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
  ret, rk_obj->base.size);
@@ -98,11 +98,10 @@ static int rockchip_gem_get_pages(struct 
rockchip_gem_object *rk_obj)
 * TODO: Replace this by drm_clflush_sg() once it can be implemented
 * without relying on symbols that are not exported.
 */
-   for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i)
+   for_each_sgtable_sg(rk_obj->sgt, s, i)
sg_dma_address(s) = sg_phys(s);
 
-   dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, rk_obj->sgt->nents,
-  DMA_TO_DEVICE);
+   dma_sync_sgtable_for_device(drm->dev, rk_obj->sgt, DMA_TO_DEVICE);
 
return 0;
 
@@ -350,8 +349,8 @@ void rockchip_gem_free_object(struct drm_gem_object *obj)
if (private->domain) {
rockchip_gem_iommu_unmap(rk_obj);
} else {
-   dma_unmap_sg(drm->dev, rk_obj->sgt->sgl,
-rk_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(drm->dev, rk_obj->sgt,
+ DMA_BIDIRECTIONAL, 0);
}
drm_prime_gem_destroy(obj, rk_obj->sgt);
} else {
@@ -476,15 +475,13 @@ rockchip_gem_dma_map_sg(struct drm_device *drm,
struct sg_table *sg,
struct rockchip_gem_object *rk_obj)
 {
-   int count = dma_map_sg(drm->dev, sg->sgl, sg->nents,
-  DMA_BIDIRECTIONAL);
-   if (!count)
-   return -EINVAL;
+   int err = dma_map_sgtable(drm->dev, sg, DMA_BIDIRECTIONAL, 0);
+   if (err)
+   return err;
 
if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) {
DRM_ERROR("failed to map sg_table to contiguous linear 
address.\n");
-   dma_unmap_sg(drm->dev, sg->sgl, sg->nents,
-DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(drm->dev, sg, DMA_BIDIRECTIONAL, 0);
return -EINVAL;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 11/30] drm: mediatek: use common helper for extracting pages array

2020-09-04 Thread Marek Szyprowski
Use common helper for converting a sg_table object into struct
page pointer array.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Robin Murphy 
Acked-by: Chun-Kuang Hu 
---
 drivers/gpu/drm/mediatek/mtk_drm_gem.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 3654ec732029..0583e557ad37 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -233,9 +233,7 @@ void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct mtk_drm_gem_obj *mtk_gem = to_mtk_gem_obj(obj);
struct sg_table *sgt;
-   struct sg_page_iter iter;
unsigned int npages;
-   unsigned int i = 0;
 
if (mtk_gem->kvaddr)
return mtk_gem->kvaddr;
@@ -249,11 +247,8 @@ void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
if (!mtk_gem->pages)
goto out;
 
-   for_each_sg_page(sgt->sgl, , sgt->orig_nents, 0) {
-   mtk_gem->pages[i++] = sg_page_iter_page();
-   if (i > npages)
-   break;
-   }
+   drm_prime_sg_to_page_addr_arrays(sgt, mtk_gem->pages, NULL, npages);
+
mtk_gem->kvaddr = vmap(mtk_gem->pages, npages, VM_MAP,
   pgprot_writecombine(PAGE_KERNEL));
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 04/30] drm: armada: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/armada/armada_gem.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index 8005614d2e6b..a63008ce284d 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -379,7 +379,7 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment 
*attach,
struct armada_gem_object *dobj = drm_to_armada_gem(obj);
struct scatterlist *sg;
struct sg_table *sgt;
-   int i, num;
+   int i;
 
sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
if (!sgt)
@@ -395,22 +395,18 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment 
*attach,
 
mapping = dobj->obj.filp->f_mapping;
 
-   for_each_sg(sgt->sgl, sg, count, i) {
+   for_each_sgtable_sg(sgt, sg, i) {
struct page *page;
 
page = shmem_read_mapping_page(mapping, i);
-   if (IS_ERR(page)) {
-   num = i;
+   if (IS_ERR(page))
goto release;
-   }
 
sg_set_page(sg, page, PAGE_SIZE, 0);
}
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
-   num = sgt->nents;
+   if (dma_map_sgtable(attach->dev, sgt, dir, 0))
goto release;
-   }
} else if (dobj->page) {
/* Single contiguous page */
if (sg_alloc_table(sgt, 1, GFP_KERNEL))
@@ -418,7 +414,7 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment 
*attach,
 
sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
+   if (dma_map_sgtable(attach->dev, sgt, dir, 0))
goto free_table;
} else if (dobj->linear) {
/* Single contiguous physical region - no struct page */
@@ -432,8 +428,9 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment 
*attach,
return sgt;
 
  release:
-   for_each_sg(sgt->sgl, sg, num, i)
-   put_page(sg_page(sg));
+   for_each_sgtable_sg(sgt, sg, i)
+   if (sg_page(sg))
+   put_page(sg_page(sg));
  free_table:
sg_free_table(sgt);
  free_sgt:
@@ -449,11 +446,12 @@ static void armada_gem_prime_unmap_dma_buf(struct 
dma_buf_attachment *attach,
int i;
 
if (!dobj->linear)
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sgtable(attach->dev, sgt, dir, 0);
 
if (dobj->obj.filp) {
struct scatterlist *sg;
-   for_each_sg(sgt->sgl, sg, sgt->nents, i)
+
+   for_each_sgtable_sg(sgt, sg, i)
put_page(sg_page(sg));
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 14/30] drm: panfrost: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Steven Price 
Reviewed-by: Rob Herring 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 4 ++--
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..1a6cea0e0bd7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -41,8 +41,8 @@ static void panfrost_gem_free_object(struct drm_gem_object 
*obj)
 
for (i = 0; i < n_sgt; i++) {
if (bo->sgts[i].sgl) {
-   dma_unmap_sg(pfdev->dev, bo->sgts[i].sgl,
-bo->sgts[i].nents, 
DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(pfdev->dev, >sgts[i],
+ DMA_BIDIRECTIONAL, 0);
sg_free_table(>sgts[i]);
}
}
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..776448c527ea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -253,7 +253,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct 
panfrost_mmu *mmu,
struct io_pgtable_ops *ops = mmu->pgtbl_ops;
u64 start_iova = iova;
 
-   for_each_sg(sgt->sgl, sgl, sgt->nents, count) {
+   for_each_sgtable_dma_sg(sgt, sgl, count) {
unsigned long paddr = sg_dma_address(sgl);
size_t len = sg_dma_len(sgl);
 
@@ -517,10 +517,9 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
if (ret)
goto err_pages;
 
-   if (!dma_map_sg(pfdev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL)) {
-   ret = -EINVAL;
+   ret = dma_map_sgtable(pfdev->dev, sgt, DMA_BIDIRECTIONAL, 0);
+   if (ret)
goto err_map;
-   }
 
mmu_map_sg(pfdev, bomapping->mmu, addr,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 21/30] drm: xen: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

Fix the code to refer to proper nents or orig_nents entries. This driver
reports the number of the pages in the imported scatterlist, so it should
refer to sg_table->orig_nents entry.

Signed-off-by: Marek Szyprowski 
Acked-by: Oleksandr Andrushchenko 
---
 drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 39ff95b75357..0e57c80058b2 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -216,7 +216,7 @@ xen_drm_front_gem_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 
DRM_DEBUG("Imported buffer of size %zu with nents %u\n",
- size, sgt->nents);
+ size, sgt->orig_nents);
 
return _obj->base;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 01/30] drm: prime: add common helper to check scatterlist contiguity

2020-09-04 Thread Marek Szyprowski
It is a common operation done by DRM drivers to check the contiguity
of the DMA-mapped buffer described by a scatterlist in the
sg_table object. Let's add a common helper for this operation.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 23 +++--
 drivers/gpu/drm/drm_prime.c  | 31 
 include/drm/drm_prime.h  |  2 ++
 3 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index 822edeadbab3..59b9ca207b42 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -471,26 +471,9 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 {
struct drm_gem_cma_object *cma_obj;
 
-   if (sgt->nents != 1) {
-   /* check if the entries in the sg_table are contiguous */
-   dma_addr_t next_addr = sg_dma_address(sgt->sgl);
-   struct scatterlist *s;
-   unsigned int i;
-
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
-   /*
-* sg_dma_address(s) is only valid for entries
-* that have sg_dma_len(s) != 0
-*/
-   if (!sg_dma_len(s))
-   continue;
-
-   if (sg_dma_address(s) != next_addr)
-   return ERR_PTR(-EINVAL);
-
-   next_addr = sg_dma_address(s) + sg_dma_len(s);
-   }
-   }
+   /* check if the entries in the sg_table are contiguous */
+   if (drm_prime_get_contiguous_size(sgt) < attach->dmabuf->size)
+   return ERR_PTR(-EINVAL);
 
/* Create a CMA GEM buffer. */
cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1693aa7c14b5..4ed5ed1f078c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -825,6 +825,37 @@ struct sg_table *drm_prime_pages_to_sg(struct page 
**pages, unsigned int nr_page
 }
 EXPORT_SYMBOL(drm_prime_pages_to_sg);
 
+/**
+ * drm_prime_get_contiguous_size - returns the contiguous size of the buffer
+ * @sgt: sg_table describing the buffer to check
+ *
+ * This helper calculates the contiguous size in the DMA address space
+ * of the the buffer described by the provided sg_table.
+ *
+ * This is useful for implementing
+ * _gem_object_funcs.gem_prime_import_sg_table.
+ */
+unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
+{
+   dma_addr_t expected = sg_dma_address(sgt->sgl);
+   struct scatterlist *sg;
+   unsigned long size = 0;
+   int i;
+
+   for_each_sgtable_dma_sg(sgt, sg, i) {
+   unsigned int len = sg_dma_len(sg);
+
+   if (!len)
+   break;
+   if (sg_dma_address(sg) != expected)
+   break;
+   expected += len;
+   size += len;
+   }
+   return size;
+}
+EXPORT_SYMBOL(drm_prime_get_contiguous_size);
+
 /**
  * drm_gem_prime_export - helper library implementation of the export callback
  * @obj: GEM object to export
diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
index 9af7422b44cf..47ef11614627 100644
--- a/include/drm/drm_prime.h
+++ b/include/drm/drm_prime.h
@@ -92,6 +92,8 @@ struct sg_table *drm_prime_pages_to_sg(struct page **pages, 
unsigned int nr_page
 struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
 int flags);
 
+unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt);
+
 /* helper functions for importing */
 struct drm_gem_object *drm_gem_prime_import_dev(struct drm_device *dev,
struct dma_buf *dma_buf,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 02/30] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-04 Thread Marek Szyprowski
Replace the current hand-crafted code for extracting pages and DMA
addresses from the given scatterlist by the much more robust
code based on the generic scatterlist iterators and recently
introduced sg_table-based wrappers. The resulting code is simple and
easy to understand, so the comment describing the old code is no
longer needed.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Reviewed-by: Robin Murphy 
---
 drivers/gpu/drm/drm_prime.c | 49 -
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 4ed5ed1f078c..c5e796d4a489 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -990,45 +990,26 @@ EXPORT_SYMBOL(drm_gem_prime_import);
 int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
 dma_addr_t *addrs, int max_entries)
 {
-   unsigned count;
-   struct scatterlist *sg;
-   struct page *page;
-   u32 page_len, page_index;
-   dma_addr_t addr;
-   u32 dma_len, dma_index;
-
-   /*
-* Scatterlist elements contains both pages and DMA addresses, but
-* one shoud not assume 1:1 relation between them. The sg->length is
-* the size of the physical memory chunk described by the sg->page,
-* while sg_dma_len(sg) is the size of the DMA (IO virtual) chunk
-* described by the sg_dma_address(sg).
-*/
-   page_index = 0;
-   dma_index = 0;
-   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
-   page_len = sg->length;
-   page = sg_page(sg);
-   dma_len = sg_dma_len(sg);
-   addr = sg_dma_address(sg);
-
-   while (pages && page_len > 0) {
-   if (WARN_ON(page_index >= max_entries))
+   struct sg_dma_page_iter dma_iter;
+   struct sg_page_iter page_iter;
+   struct page **p = pages;
+   dma_addr_t *a = addrs;
+
+   if (pages) {
+   for_each_sgtable_page(sgt, _iter, 0) {
+   if (WARN_ON(p - pages >= max_entries))
return -1;
-   pages[page_index] = page;
-   page++;
-   page_len -= PAGE_SIZE;
-   page_index++;
+   *p++ = sg_page_iter_page(_iter);
}
-   while (addrs && dma_len > 0) {
-   if (WARN_ON(dma_index >= max_entries))
+   }
+   if (addrs) {
+   for_each_sgtable_dma_page(sgt, _iter, 0) {
+   if (WARN_ON(a - addrs >= max_entries))
return -1;
-   addrs[dma_index] = addr;
-   addr += PAGE_SIZE;
-   dma_len -= PAGE_SIZE;
-   dma_index++;
+   *a++ = sg_page_iter_dma_address(_iter);
}
}
+
return 0;
 }
 EXPORT_SYMBOL(drm_prime_sg_to_page_addr_arrays);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 14/32] drm: omapdrm: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
Hi again,

On 04.09.2020 14:06, Marek Szyprowski wrote:
> Hi Tomi,
>
> On 02.09.2020 10:00, Tomi Valkeinen wrote:
>> On 01/09/2020 22:33, Robin Murphy wrote:
>>> On 2020-08-26 07:32, Marek Szyprowski wrote:
>>>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() 
>>>> function
>>>> returns the number of the created entries in the DMA address space.
>>>> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
>>>> dma_unmap_sg must be called with the original number of the entries
>>>> passed to the dma_map_sg().
>>>>
>>>> struct sg_table is a common structure used for describing a 
>>>> non-contiguous
>>>> memory buffer, used commonly in the DRM and graphics subsystems. It
>>>> consists of a scatterlist with memory pages and DMA addresses (sgl 
>>>> entry),
>>>> as well as the number of scatterlist entries: CPU pages (orig_nents 
>>>> entry)
>>>> and DMA mapped pages (nents entry).
>>>>
>>>> It turned out that it was a common mistake to misuse nents and 
>>>> orig_nents
>>>> entries, calling DMA-mapping functions with a wrong number of 
>>>> entries or
>>>> ignoring the number of mapped entries returned by the dma_map_sg()
>>>> function.
>>>>
>>>> Fix the code to refer to proper nents or orig_nents entries. This 
>>>> driver
>>>> checks for a buffer contiguity in DMA address space, so it should test
>>>> sg_table->nents entry.
>>>>
>>>> Signed-off-by: Marek Szyprowski 
>>>> ---
>>>>    drivers/gpu/drm/omapdrm/omap_gem.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
>>>> b/drivers/gpu/drm/omapdrm/omap_gem.c
>>>> index ff0c4b0c3fd0..a7a9a0afe2b6 100644
>>>> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
>>>> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
>>>> @@ -48,7 +48,7 @@ struct omap_gem_object {
>>>>     *   OMAP_BO_MEM_DMA_API flag set)
>>>>     *
>>>>     * - buffers imported from dmabuf (with the 
>>>> OMAP_BO_MEM_DMABUF flag set)
>>>> - *   if they are physically contiguous (when sgt->orig_nents 
>>>> == 1)
>>>> + *   if they are physically contiguous (when sgt->nents == 1)
>>> Hmm, if this really does mean *physically* contiguous - i.e. if 
>>> buffers might be shared between
>>> DMA-translatable and non-DMA-translatable devices - then these 
>>> changes might not be appropriate. If
>>> not and it only actually means DMA-contiguous, then it would be good 
>>> to clarify the comments to that
>>> effect.
>>>
>>> Can anyone familiar with omapdrm clarify what exactly the case is 
>>> here? I know that IOMMUs might be
>>> involved to some degree, and I've skimmed the interconnect chapters 
>>> of enough OMAP TRMs to be scared
>>> by the reference to the tiler aperture in the context below :)
>> DSS (like many other IPs in OMAP) does not have any MMU/PAT, and can 
>> only use contiguous buffers
>> (contiguous in the RAM).
>>
>> There's a special case with TILER (which is not part of DSS but of 
>> the memory subsystem, but it's
>> still handled internally by the omapdrm driver), which has a PAT. PAT 
>> can create a contiguous view
>> of scattered pages, and DSS can then use this contiguous view ("tiler 
>> aperture", which to DSS looks
>> just like normal contiguous memory).
>>
>> Note that omapdrm does not use dma_map_sg() & co. mentioned in the 
>> patch description.
>>
>> If there's no MMU/PAT, is orig_nents always the same as nents? Or can 
>> we have multiple physically
>> contiguous pages listed separately in the sgt (so orig_nents > 1) but 
>> as the pages form one big
>> contiguous area, nents == 1?
>
> Well, when DMA-mapping API is properly used, the difference between 
> nents and orig_nents is only when the scatterlist have been mapped for 
> DMA.
>
> For the mentioned case, even if the creator of the buffer used only 
> the pages that are consecutive in the physical memory, he is free to 
> chose either to set nents/orig_nents to 1 and length to n*PAGE_SIZE or 
> set nents/orig_nents to n and length to PAGE_SIZE for each. Then the 
> buffer chunks might be merged, but this is done by the DMA-mapping 
> code. For your case, without any call to DMA-mapping, you can only 
> assume that the buffer is contiguous in physical memory if orig_nents 
> is 1.
>
> I've changed the use of nents to orig_nents to make things consistent 
> - this code operates only on the unmapped buffers. I want to ensure 
> that anyone who will potentially copy this code, won't make the 
> nents/orig_nents mistake in the future.

I've just noticed that I've read my patch (the diff) in the reverse 
order, I'm sorry. The omapdrm code is right, this patch should be dropped.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 14/32] drm: omapdrm: fix common struct sg_table related issues

2020-09-04 Thread Marek Szyprowski
Hi Tomi,

On 02.09.2020 10:00, Tomi Valkeinen wrote:
> On 01/09/2020 22:33, Robin Murphy wrote:
>> On 2020-08-26 07:32, Marek Szyprowski wrote:
>>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
>>> returns the number of the created entries in the DMA address space.
>>> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
>>> dma_unmap_sg must be called with the original number of the entries
>>> passed to the dma_map_sg().
>>>
>>> struct sg_table is a common structure used for describing a non-contiguous
>>> memory buffer, used commonly in the DRM and graphics subsystems. It
>>> consists of a scatterlist with memory pages and DMA addresses (sgl entry),
>>> as well as the number of scatterlist entries: CPU pages (orig_nents entry)
>>> and DMA mapped pages (nents entry).
>>>
>>> It turned out that it was a common mistake to misuse nents and orig_nents
>>> entries, calling DMA-mapping functions with a wrong number of entries or
>>> ignoring the number of mapped entries returned by the dma_map_sg()
>>> function.
>>>
>>> Fix the code to refer to proper nents or orig_nents entries. This driver
>>> checks for a buffer contiguity in DMA address space, so it should test
>>> sg_table->nents entry.
>>>
>>> Signed-off-by: Marek Szyprowski 
>>> ---
>>>    drivers/gpu/drm/omapdrm/omap_gem.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
>>> b/drivers/gpu/drm/omapdrm/omap_gem.c
>>> index ff0c4b0c3fd0..a7a9a0afe2b6 100644
>>> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
>>> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
>>> @@ -48,7 +48,7 @@ struct omap_gem_object {
>>>     *   OMAP_BO_MEM_DMA_API flag set)
>>>     *
>>>     * - buffers imported from dmabuf (with the OMAP_BO_MEM_DMABUF flag 
>>> set)
>>> - *   if they are physically contiguous (when sgt->orig_nents == 1)
>>> + *   if they are physically contiguous (when sgt->nents == 1)
>> Hmm, if this really does mean *physically* contiguous - i.e. if buffers 
>> might be shared between
>> DMA-translatable and non-DMA-translatable devices - then these changes might 
>> not be appropriate. If
>> not and it only actually means DMA-contiguous, then it would be good to 
>> clarify the comments to that
>> effect.
>>
>> Can anyone familiar with omapdrm clarify what exactly the case is here? I 
>> know that IOMMUs might be
>> involved to some degree, and I've skimmed the interconnect chapters of 
>> enough OMAP TRMs to be scared
>> by the reference to the tiler aperture in the context below :)
> DSS (like many other IPs in OMAP) does not have any MMU/PAT, and can only use 
> contiguous buffers
> (contiguous in the RAM).
>
> There's a special case with TILER (which is not part of DSS but of the memory 
> subsystem, but it's
> still handled internally by the omapdrm driver), which has a PAT. PAT can 
> create a contiguous view
> of scattered pages, and DSS can then use this contiguous view ("tiler 
> aperture", which to DSS looks
> just like normal contiguous memory).
>
> Note that omapdrm does not use dma_map_sg() & co. mentioned in the patch 
> description.
>
> If there's no MMU/PAT, is orig_nents always the same as nents? Or can we have 
> multiple physically
> contiguous pages listed separately in the sgt (so orig_nents > 1) but as the 
> pages form one big
> contiguous area, nents == 1?

Well, when DMA-mapping API is properly used, the difference between 
nents and orig_nents is only when the scatterlist have been mapped for DMA.

For the mentioned case, even if the creator of the buffer used only the 
pages that are consecutive in the physical memory, he is free to chose 
either to set nents/orig_nents to 1 and length to n*PAGE_SIZE or set 
nents/orig_nents to n and length to PAGE_SIZE for each. Then the buffer 
chunks might be merged, but this is done by the DMA-mapping code. For 
your case, without any call to DMA-mapping, you can only assume that the 
buffer is contiguous in physical memory if orig_nents is 1.

I've changed the use of nents to orig_nents to make things consistent - 
this code operates only on the unmapped buffers. I want to ensure that 
anyone who will potentially copy this code, won't make the 
nents/orig_nents mistake in the future.

If you don't like it, we can drop this patch, because it won't change 
the way the driver works.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v9 31/32] media: pci: fix common ALSA DMA-mapping related codes

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/pci/cx23885/cx23885-alsa.c | 2 +-
 drivers/media/pci/cx25821/cx25821-alsa.c | 2 +-
 drivers/media/pci/cx88/cx88-alsa.c   | 2 +-
 drivers/media/pci/saa7134/saa7134-alsa.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/media/pci/cx23885/cx23885-alsa.c 
b/drivers/media/pci/cx23885/cx23885-alsa.c
index df44ed7393a0..3f366e4e4685 100644
--- a/drivers/media/pci/cx23885/cx23885-alsa.c
+++ b/drivers/media/pci/cx23885/cx23885-alsa.c
@@ -129,7 +129,7 @@ static int cx23885_alsa_dma_unmap(struct cx23885_audio_dev 
*dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages, 
PCI_DMA_FROMDEVICE);
buf->sglen = 0;
return 0;
 }
diff --git a/drivers/media/pci/cx25821/cx25821-alsa.c 
b/drivers/media/pci/cx25821/cx25821-alsa.c
index 301616426d8a..c40304d33776 100644
--- a/drivers/media/pci/cx25821/cx25821-alsa.c
+++ b/drivers/media/pci/cx25821/cx25821-alsa.c
@@ -193,7 +193,7 @@ static int cx25821_alsa_dma_unmap(struct cx25821_audio_dev 
*dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages, 
PCI_DMA_FROMDEVICE);
buf->sglen = 0;
return 0;
 }
diff --git a/drivers/media/pci/cx88/cx88-alsa.c 
b/drivers/media/pci/cx88/cx88-alsa.c
index 7d7aceecc985..3c6fe6ceb0b7 100644
--- a/drivers/media/pci/cx88/cx88-alsa.c
+++ b/drivers/media/pci/cx88/cx88-alsa.c
@@ -332,7 +332,7 @@ static int cx88_alsa_dma_unmap(struct cx88_audio_dev *dev)
if (!buf->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, buf->sglist, buf->sglen,
+   dma_unmap_sg(>pci->dev, buf->sglist, buf->nr_pages,
 PCI_DMA_FROMDEVICE);
buf->sglen = 0;
return 0;
diff --git a/drivers/media/pci/saa7134/saa7134-alsa.c 
b/drivers/media/pci/saa7134/saa7134-alsa.c
index 544ca57eee75..398c47ff473d 100644
--- a/drivers/media/pci/saa7134/saa7134-alsa.c
+++ b/drivers/media/pci/saa7134/saa7134-alsa.c
@@ -313,7 +313,7 @@ static int saa7134_alsa_dma_unmap(struct saa7134_dev *dev)
if (!dma->sglen)
return 0;
 
-   dma_unmap_sg(>pci->dev, dma->sglist, dma->sglen, 
PCI_DMA_FROMDEVICE);
+   dma_unmap_sg(>pci->dev, dma->sglist, dma->nr_pages, 
PCI_DMA_FROMDEVICE);
dma->sglen = 0;
return 0;
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 28/32] misc: fastrpc: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
---
 drivers/misc/fastrpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 7939c55daceb..9d6867749316 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -518,7 +518,7 @@ fastrpc_map_dma_buf(struct dma_buf_attachment *attachment,
 
table = >sgt;
 
-   if (!dma_map_sg(attachment->dev, table->sgl, table->nents, dir))
+   if (!dma_map_sgtable(attachment->dev, table, dir, 0))
return ERR_PTR(-ENOMEM);
 
return table;
@@ -528,7 +528,7 @@ static void fastrpc_unmap_dma_buf(struct dma_buf_attachment 
*attach,
  struct sg_table *table,
  enum dma_data_direction dir)
 {
-   dma_unmap_sg(attach->dev, table->sgl, table->nents, dir);
+   dma_unmap_sgtable(attach->dev, table, dir, 0);
 }
 
 static void fastrpc_release(struct dma_buf *dmabuf)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 32/32] videobuf2: use sgtable-based scatterlist wrappers

2020-08-26 Thread Marek Szyprowski
Use recently introduced common wrappers operating directly on the struct
sg_table objects and scatterlist page iterators to make the code a bit
more compact, robust, easier to follow and copy/paste safe.

No functional change, because the code already properly did all the
scaterlist related calls.

Signed-off-by: Marek Szyprowski 
---
 .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
 .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
 .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
 3 files changed, 31 insertions(+), 47 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8..1b242d844dde 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
sg_table *sgt)
unsigned int i;
unsigned long size = 0;
 
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
+   for_each_sgtable_dma_sg(sgt, s, i) {
if (sg_dma_address(s) != expected)
break;
-   expected = sg_dma_address(s) + sg_dma_len(s);
+   expected += sg_dma_len(s);
size += sg_dma_len(s);
}
return size;
@@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir);
+   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
+   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
 /*/
@@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
 * memory locations do not require any explicit cache
 * maintenance prior or after being used by the device.
 */
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
kfree(attach);
db_attach->priv = NULL;
@@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
attach->dma_dir = DMA_NONE;
}
 
@@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 * mapping to the client with new direction, no cache sync
 * required see comment in vb2_dc_dmabuf_ops_detach()
 */
-   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
- dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (!sgt->nents) {
+   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {
pr_err("failed to map scatterlist\n");
mutex_unlock(lock);
return ERR_PTR(-EIO);
@@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
 * No need to sync to CPU, it's already synced to the CPU
 * since the finish() memop will have been called before this.
 */
-   dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(buf->dev, sgt, buf->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
pages = frame_vector_pages(buf->vec);
/* sgt should exist only if vector contains pages... */
BUG_ON(IS_ERR(pages));
@@ -553,9 +551,8 @@ static void *vb2_dc_get_userptr(struct device *dev, 
unsigned long vaddr,
 * No need to sync to the device, this will happen later when the
 * prepare() memop is called.
 */
-   sgt->nents = dma_map_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
- buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (sgt->nents <= 0) {
+ 

[PATCH v9 30/32] samples: vfio-mdev/mbochs: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

While touching this code, also add missing call to dma_unmap_sgtable.

Signed-off-by: Marek Szyprowski 
---
 samples/vfio-mdev/mbochs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index 3cc5e5921682..e03068917273 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -846,7 +846,7 @@ static struct sg_table *mbochs_map_dmabuf(struct 
dma_buf_attachment *at,
if (sg_alloc_table_from_pages(sg, dmabuf->pages, dmabuf->pagecount,
  0, dmabuf->mode.size, GFP_KERNEL) < 0)
goto err2;
-   if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction))
+   if (dma_map_sgtable(at->dev, sg, direction, 0))
goto err3;
 
return sg;
@@ -868,6 +868,7 @@ static void mbochs_unmap_dmabuf(struct dma_buf_attachment 
*at,
 
dev_dbg(dev, "%s: %d\n", __func__, dmabuf->id);
 
+   dma_unmap_sgtable(at->dev, sg, direction, 0);
sg_free_table(sg);
kfree(sg);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 27/32] staging: tegra-vde: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Dmitry Osipenko 
---
 drivers/staging/media/tegra-vde/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/tegra-vde/iommu.c 
b/drivers/staging/media/tegra-vde/iommu.c
index 6af863d92123..adf8dc7ee25c 100644
--- a/drivers/staging/media/tegra-vde/iommu.c
+++ b/drivers/staging/media/tegra-vde/iommu.c
@@ -36,8 +36,8 @@ int tegra_vde_iommu_map(struct tegra_vde *vde,
 
addr = iova_dma_addr(>iova, iova);
 
-   size = iommu_map_sg(vde->domain, addr, sgt->sgl, sgt->nents,
-   IOMMU_READ | IOMMU_WRITE);
+   size = iommu_map_sgtable(vde->domain, addr, sgt,
+IOMMU_READ | IOMMU_WRITE);
if (!size) {
__free_iova(>iova, iova);
return -ENXIO;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 24/32] drm: host1x: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/host1x/job.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 89b6c14b7392..82d0a60ba3f7 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -170,11 +170,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto unpin;
}
 
-   err = dma_map_sg(dev, sgt->sgl, sgt->nents, dir);
-   if (!err) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(dev, sgt, dir, 0);
+   if (err)
goto unpin;
-   }
 
job->unpins[job->num_unpins].dev = dev;
job->unpins[job->num_unpins].dir = dir;
@@ -228,7 +226,7 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
}
 
if (host->domain) {
-   for_each_sg(sgt->sgl, sg, sgt->nents, j)
+   for_each_sgtable_sg(sgt, sg, j)
gather_size += sg->length;
gather_size = iova_align(>iova, gather_size);
 
@@ -240,9 +238,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto put;
}
 
-   err = iommu_map_sg(host->domain,
+   err = iommu_map_sgtable(host->domain,
iova_dma_addr(>iova, alloc),
-   sgt->sgl, sgt->nents, IOMMU_READ);
+   sgt, IOMMU_READ);
if (err == 0) {
__free_iova(>iova, alloc);
err = -EINVAL;
@@ -252,12 +250,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
job->unpins[job->num_unpins].size = gather_size;
phys_addr = iova_dma_addr(>iova, alloc);
} else if (sgt) {
-   err = dma_map_sg(host->dev, sgt->sgl, sgt->nents,
-DMA_TO_DEVICE);
-   if (!err) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(host->dev, sgt, DMA_TO_DEVICE, 0);
+   if (err)
goto put;
-   }
 
job->unpins[job->num_unpins].dir = DMA_TO_DEVICE;
job->unpins[job->num_unpins].dev = host->dev;
@@ -660,8 +655,7 @@ void host1x_job_unpin(struct host1x_job *job)
}
 
if (unpin->dev && sgt)
-   dma_unmap_sg(unpin->dev, sgt->sgl, sgt->nents,
-unpin->dir);
+   dma_unmap_sgtable(unpin->dev, sgt, unpin->dir, 0);
 
host1x_bo_unpin(dev, unpin->bo, sgt);
host1x_bo_put(unpin->bo);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 25/32] drm: rcar-du: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

dma_map_sgtable() function returns zero or an error code, so adjust the
return value check for the vsp1_du_map_sg() function.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Laurent Pinchart 
---
 drivers/gpu/drm/rcar-du/rcar_du_vsp.c  | 3 +--
 drivers/media/platform/vsp1/vsp1_drm.c | 8 
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c 
b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
index f1a81c9b184d..a27bff999649 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
@@ -197,9 +197,8 @@ int rcar_du_vsp_map_fb(struct rcar_du_vsp *vsp, struct 
drm_framebuffer *fb,
goto fail;
 
ret = vsp1_du_map_sg(vsp->vsp, sgt);
-   if (!ret) {
+   if (ret) {
sg_free_table(sgt);
-   ret = -ENOMEM;
goto fail;
}
}
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c 
b/drivers/media/platform/vsp1/vsp1_drm.c
index a4a45d68a6ef..86d5e3f4b1ff 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -912,8 +912,8 @@ int vsp1_du_map_sg(struct device *dev, struct sg_table *sgt)
 * skip cache sync. This will need to be revisited when support for
 * non-coherent buffers will be added to the DU driver.
 */
-   return dma_map_sg_attrs(vsp1->bus_master, sgt->sgl, sgt->nents,
-   DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
+   return dma_map_sgtable(vsp1->bus_master, sgt, DMA_TO_DEVICE,
+  DMA_ATTR_SKIP_CPU_SYNC);
 }
 EXPORT_SYMBOL_GPL(vsp1_du_map_sg);
 
@@ -921,8 +921,8 @@ void vsp1_du_unmap_sg(struct device *dev, struct sg_table 
*sgt)
 {
struct vsp1_device *vsp1 = dev_get_drvdata(dev);
 
-   dma_unmap_sg_attrs(vsp1->bus_master, sgt->sgl, sgt->nents,
-  DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(vsp1->bus_master, sgt, DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
 }
 EXPORT_SYMBOL_GPL(vsp1_du_unmap_sg);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 00/32] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-08-26 Thread Marek Szyprowski
Dear All,

During the Exynos DRM GEM rework and fixing the issues in the.
drm_prime_sg_to_page_addr_arrays() function [1] I've noticed that most
drivers in DRM framework incorrectly use nents and orig_nents entries of
the struct sg_table.

In case of the most DMA-mapping implementations exchanging those two
entries or using nents for all loops on the scatterlist is harmless,
because they both have the same value. There exists however a DMA-mapping
implementations, for which such incorrect usage breaks things. The nents
returned by dma_map_sg() might be lower than the nents passed as its
parameter and this is perfectly fine. DMA framework or IOMMU is allowed
to join consecutive chunks while mapping if such operation is supported
by the underlying HW (bus, bridge, IOMMU, etc). Example of the case
where dma_map_sg() might return 1 'DMA' chunk for the 4 'physical' pages
is described here [2]

The DMA-mapping framework documentation [3] states that dma_map_sg()
returns the numer of the created entries in the DMA address space.
However the subsequent calls to dma_sync_sg_for_{device,cpu} and
dma_unmap_sg must be called with the original number of entries passed to
dma_map_sg. The common pattern in DRM drivers were to assign the
dma_map_sg() return value to sg_table->nents and use that value for
the subsequent calls to dma_sync_sg_* or dma_unmap_sg functions. Also
the code iterated over nents times to access the pages stored in the
processed scatterlist, while it should use orig_nents as the numer of
the page entries.

I've tried to identify all such incorrect usage of sg_table->nents and
this is a result of my research. It looks that the incorrect pattern has
been copied over the many drivers mainly in the DRM subsystem. Too bad in
most cases it even worked correctly if the system used a simple, linear
DMA-mapping implementation, for which swapping nents and orig_nents
doesn't make any difference. To avoid similar issues in the future, I've
introduced a common wrappers for DMA-mapping calls, which operate directly
on the sg_table objects. I've also added wrappers for iterating over the
scatterlists stored in the sg_table objects and applied them where
possible. This, together with some common DRM prime helpers, allowed me
to almost get rid of all nents/orig_nents usage in the drivers. I hope
that such change makes the code robust, easier to follow and copy/paste
safe.

The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
it fully. The driver creatively uses sg_table->orig_nents to store the
size of the allocate scatterlist and ignores the number of the entries
returned by dma_map_sg function. In this patchset I only fixed the
sg_table objects exported by dmabuf related functions. I hope that I
didn't break anything there.

Patches are based on top of Linux next-20200825. The required changes to
DMA-mapping framework has been already merged to v5.8-rc1.

I would like ask for merging of the 1-27 patches via DRM misc tree.

Best regards,
Marek Szyprowski


References:

[1] https://lkml.org/lkml/2020/3/27/555
[2] https://lkml.org/lkml/2020/3/29/65
[3] Documentation/DMA-API-HOWTO.txt
[4] 
https://lore.kernel.org/linux-iommu/20200512121931.gd20...@lst.de/T/#ma18c958a48c3b241d5409517fa7d192eef87459b

Changelog:

v9:
- rebased onto Linux next-20200825, which is based on v5.9-rc2; fixed conflicts
- dropped merged patches

v8:
- rapidio: fixed issues pointed by kbuilt test robot (use of uninitialized
variable
- vb2: rebased after recent changes in the code

v7: 
https://lore.kernel.org/linux-iommu/20200619103636.11974-1-m.szyprow...@samsung.com/T/
- changed DMA page interators to standard DMA SG iterators in drm/prime and
  videobuf2-dma-contig as suggested by Robin Murphy
- fixed build issues

v6: 
https://lore.kernel.org/linux-iommu/20200618153956.29558-1-m.szyprow...@samsung.com/T/
- rebased onto Linux next-20200618, which is based on v5.8-rc1; fixed conflicts

v5: 
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/
- fixed some minor style issues and typos
- fixed lack of the attrs argument in ion, dmabuf, rapidio, fastrpc and
  vfio patches

v4: https://lore.kernel.org/linux-iommu/20200512121931.gd20...@lst.de/T/
- added for_each_sgtable_* wrappers and applied where possible
- added drm_prime_get_contiguous_size() and applied where possible
- applied drm_prime_sg_to_page_addr_arrays() where possible to remove page
  extraction from sg_table objects
- added documentation for the introduced wrappers
- improved patches description a bit

v3: 
https://lore.kernel.org/dri-devel/20200505083926.28503-1-m.szyprow...@samsung.com/
- introduce dma_*_sgtable_* wrappers and use them in all patches

v2: 
https://lore.kernel.org/linux-iommu/c01c9766-9778-fd1f-f36e-2dc7bd376...@arm.com/T/
- dropped most of the changes to drm/i915
- added fixes for rcar-du, xen, media and ion
- fixed a few issues pointed by kbuild test robot
- added wide cc: list for each patch

v

[PATCH v9 17/32] drm: rockchip: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 23 +
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 2970e534e2bb..cb50f2ba2e46 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -36,8 +36,8 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object 
*rk_obj)
 
rk_obj->dma_addr = rk_obj->mm.start;
 
-   ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
-  rk_obj->sgt->nents, prot);
+   ret = iommu_map_sgtable(private->domain, rk_obj->dma_addr, rk_obj->sgt,
+   prot);
if (ret < rk_obj->base.size) {
DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
  ret, rk_obj->base.size);
@@ -98,11 +98,10 @@ static int rockchip_gem_get_pages(struct 
rockchip_gem_object *rk_obj)
 * TODO: Replace this by drm_clflush_sg() once it can be implemented
 * without relying on symbols that are not exported.
 */
-   for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i)
+   for_each_sgtable_sg(rk_obj->sgt, s, i)
sg_dma_address(s) = sg_phys(s);
 
-   dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, rk_obj->sgt->nents,
-  DMA_TO_DEVICE);
+   dma_sync_sgtable_for_device(drm->dev, rk_obj->sgt, DMA_TO_DEVICE);
 
return 0;
 
@@ -350,8 +349,8 @@ void rockchip_gem_free_object(struct drm_gem_object *obj)
if (private->domain) {
rockchip_gem_iommu_unmap(rk_obj);
} else {
-   dma_unmap_sg(drm->dev, rk_obj->sgt->sgl,
-rk_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(drm->dev, rk_obj->sgt,
+ DMA_BIDIRECTIONAL, 0);
}
drm_prime_gem_destroy(obj, rk_obj->sgt);
} else {
@@ -476,15 +475,13 @@ rockchip_gem_dma_map_sg(struct drm_device *drm,
struct sg_table *sg,
struct rockchip_gem_object *rk_obj)
 {
-   int count = dma_map_sg(drm->dev, sg->sgl, sg->nents,
-  DMA_BIDIRECTIONAL);
-   if (!count)
-   return -EINVAL;
+   int err = dma_map_sgtable(drm->dev, sg, DMA_BIDIRECTIONAL, 0);
+   if (err)
+   return err;
 
if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) {
DRM_ERROR("failed to map sg_table to contiguous linear 
address.\n");
-   dma_unmap_sg(drm->dev, sg->sgl, sg->nents,
-DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(drm->dev, sg, DMA_BIDIRECTIONAL, 0);
return -EINVAL;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 23/32] xen: gntdev: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Juergen Gross 
---
 drivers/xen/gntdev-dmabuf.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index b1b6eebafd5d..4c13cbc99896 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -247,10 +247,9 @@ static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
 
if (sgt) {
if (gntdev_dmabuf_attach->dir != DMA_NONE)
-   dma_unmap_sg_attrs(attach->dev, sgt->sgl,
-  sgt->nents,
-  gntdev_dmabuf_attach->dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(attach->dev, sgt,
+ gntdev_dmabuf_attach->dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
}
 
@@ -288,8 +287,8 @@ dmabuf_exp_ops_map_dma_buf(struct dma_buf_attachment 
*attach,
sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
  gntdev_dmabuf->nr_pages);
if (!IS_ERR(sgt)) {
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   if (dma_map_sgtable(attach->dev, sgt, dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {
sg_free_table(sgt);
kfree(sgt);
sgt = ERR_PTR(-ENOMEM);
@@ -633,7 +632,7 @@ dmabuf_imp_to_refs(struct gntdev_dmabuf_priv *priv, struct 
device *dev,
 
/* Now convert sgt to array of pages and check for page validity. */
i = 0;
-   for_each_sg_page(sgt->sgl, _iter, sgt->nents, 0) {
+   for_each_sgtable_page(sgt, _iter, 0) {
struct page *page = sg_page_iter_page(_iter);
/*
 * Check if page is valid: this can happen if we are given
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 03/32] drm: core: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
---
 drivers/gpu/drm/drm_cache.c|  2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +-
 drivers/gpu/drm/drm_prime.c| 11 ++-
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 03e01b000f7a..0fe3c496002a 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -127,7 +127,7 @@ drm_clflush_sg(struct sg_table *st)
struct sg_page_iter sg_iter;
 
mb(); /*CLFLUSH is ordered only by using memory barriers*/
-   for_each_sg_page(st->sgl, _iter, st->nents, 0)
+   for_each_sgtable_page(st, _iter, 0)
drm_clflush_page(sg_page_iter_page(_iter));
mb(); /*Make sure that all cache line entry is flushed*/
 
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 4b7cfbac4daa..47d8211221f2 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -126,8 +126,8 @@ void drm_gem_shmem_free_object(struct drm_gem_object *obj)
drm_prime_gem_destroy(obj, shmem->sgt);
} else {
if (shmem->sgt) {
-   dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(obj->dev->dev, shmem->sgt,
+ DMA_BIDIRECTIONAL, 0);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
}
@@ -424,8 +424,7 @@ void drm_gem_shmem_purge_locked(struct drm_gem_object *obj)
 
WARN_ON(!drm_gem_shmem_is_purgeable(shmem));
 
-   dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(obj->dev->dev, shmem->sgt, DMA_BIDIRECTIONAL, 0);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
shmem->sgt = NULL;
@@ -697,12 +696,17 @@ struct sg_table *drm_gem_shmem_get_pages_sgt(struct 
drm_gem_object *obj)
goto err_put_pages;
}
/* Map the pages for use by the h/w. */
-   dma_map_sg(obj->dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   ret = dma_map_sgtable(obj->dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
+   if (ret)
+   goto err_free_sgt;
 
shmem->sgt = sgt;
 
return sgt;
 
+err_free_sgt:
+   sg_free_table(sgt);
+   kfree(sgt);
 err_put_pages:
drm_gem_shmem_put_pages(shmem);
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 5d181bf60a44..c45b0cc6e31d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -617,6 +617,7 @@ struct sg_table *drm_gem_map_dma_buf(struct 
dma_buf_attachment *attach,
 {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct sg_table *sgt;
+   int ret;
 
if (WARN_ON(dir == DMA_NONE))
return ERR_PTR(-EINVAL);
@@ -626,11 +627,12 @@ struct sg_table *drm_gem_map_dma_buf(struct 
dma_buf_attachment *attach,
else
sgt = obj->dev->driver->gem_prime_get_sg_table(obj);
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   ret = dma_map_sgtable(attach->dev, sgt, dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
sg_free_table(sgt);
kfree(sgt);
-   sgt = ERR_PTR(-E

[PATCH v9 26/32] dmabuf: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Gerd Hoffmann 
---
 drivers/dma-buf/heaps/heap-helpers.c | 13 ++---
 drivers/dma-buf/udmabuf.c|  7 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/dma-buf/heaps/heap-helpers.c 
b/drivers/dma-buf/heaps/heap-helpers.c
index 9f964ca3f59c..d0696cf937af 100644
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ b/drivers/dma-buf/heaps/heap-helpers.c
@@ -140,13 +140,12 @@ struct sg_table *dma_heap_map_dma_buf(struct 
dma_buf_attachment *attachment,
  enum dma_data_direction direction)
 {
struct dma_heaps_attachment *a = attachment->priv;
-   struct sg_table *table;
-
-   table = >table;
+   struct sg_table *table = >table;
+   int ret;
 
-   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-   direction))
-   table = ERR_PTR(-ENOMEM);
+   ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+   if (ret)
+   table = ERR_PTR(ret);
return table;
 }
 
@@ -154,7 +153,7 @@ static void dma_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
   struct sg_table *table,
   enum dma_data_direction direction)
 {
-   dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
+   dma_unmap_sgtable(attachment->dev, table, direction, 0);
 }
 
 static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index acb26c627d27..89e293bd9252 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -63,10 +63,9 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
GFP_KERNEL);
if (ret < 0)
goto err;
-   if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) {
-   ret = -EINVAL;
+   ret = dma_map_sgtable(dev, sg, direction, 0);
+   if (ret < 0)
goto err;
-   }
return sg;
 
 err:
@@ -78,7 +77,7 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
 static void put_sg_table(struct device *dev, struct sg_table *sg,
 enum dma_data_direction direction)
 {
-   dma_unmap_sg(dev, sg->sgl, sg->nents, direction);
+   dma_unmap_sgtable(dev, sg, direction, 0);
sg_free_table(sg);
kfree(sg);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 16/32] drm: rockchip: use common helper for a scatterlist contiguity check

2020-08-26 Thread Marek Szyprowski
Use common helper for checking the contiguity of the imported dma-buf.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b9275ba7c5a5..2970e534e2bb 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -460,23 +460,6 @@ struct sg_table *rockchip_gem_prime_get_sg_table(struct 
drm_gem_object *obj)
return sgt;
 }
 
-static unsigned long rockchip_sg_get_contiguous_size(struct sg_table *sgt,
-int count)
-{
-   struct scatterlist *s;
-   dma_addr_t expected = sg_dma_address(sgt->sgl);
-   unsigned int i;
-   unsigned long size = 0;
-
-   for_each_sg(sgt->sgl, s, count, i) {
-   if (sg_dma_address(s) != expected)
-   break;
-   expected = sg_dma_address(s) + sg_dma_len(s);
-   size += sg_dma_len(s);
-   }
-   return size;
-}
-
 static int
 rockchip_gem_iommu_map_sg(struct drm_device *drm,
  struct dma_buf_attachment *attach,
@@ -498,7 +481,7 @@ rockchip_gem_dma_map_sg(struct drm_device *drm,
if (!count)
return -EINVAL;
 
-   if (rockchip_sg_get_contiguous_size(sg, count) < attach->dmabuf->size) {
+   if (drm_prime_get_contiguous_size(sg) < attach->dmabuf->size) {
DRM_ERROR("failed to map sg_table to contiguous linear 
address.\n");
dma_unmap_sg(drm->dev, sg->sgl, sg->nents,
 DMA_BIDIRECTIONAL);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 21/32] drm: vmwgfx: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Acked-by: Roland Scheidegger 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index ab524ab3b0b4..f2f2bff1eedf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -362,8 +362,7 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
 
-   dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
-   DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, _tt->sgt, DMA_BIDIRECTIONAL, 0);
vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
 }
 
@@ -383,16 +382,8 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 static int vmw_ttm_map_for_dma(struct vmw_ttm_tt *vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
-   int ret;
-
-   ret = dma_map_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.orig_nents,
-DMA_BIDIRECTIONAL);
-   if (unlikely(ret == 0))
-   return -ENOMEM;
 
-   vmw_tt->sgt.nents = ret;
-
-   return 0;
+   return dma_map_sgtable(dev, _tt->sgt, DMA_BIDIRECTIONAL, 0);
 }
 
 /**
@@ -449,10 +440,10 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
goto out_sg_alloc_fail;
 
-   if (vsgt->num_pages > vmw_tt->sgt.nents) {
+   if (vsgt->num_pages > vmw_tt->sgt.orig_nents) {
uint64_t over_alloc =
sgl_size * (vsgt->num_pages -
-   vmw_tt->sgt.nents);
+   vmw_tt->sgt.orig_nents);
 
ttm_mem_global_free(glob, over_alloc);
vmw_tt->sg_alloc_size -= over_alloc;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 19/32] drm: v3d: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Eric Anholt 
---
 drivers/gpu/drm/v3d/v3d_mmu.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 3b81ea28c0bb..5a453532901f 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -90,18 +90,17 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
u32 page = bo->node.start;
u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-   unsigned int count;
-   struct scatterlist *sgl;
+   struct sg_dma_page_iter dma_iter;
 
-   for_each_sg(shmem_obj->sgt->sgl, sgl, shmem_obj->sgt->nents, count) {
-   u32 page_address = sg_dma_address(sgl) >> V3D_MMU_PAGE_SHIFT;
+   for_each_sgtable_dma_page(shmem_obj->sgt, _iter, 0) {
+   dma_addr_t dma_addr = sg_page_iter_dma_address(_iter);
+   u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
u32 pte = page_prot | page_address;
u32 i;
 
-   BUG_ON(page_address + (sg_dma_len(sgl) >> V3D_MMU_PAGE_SHIFT) >=
+   BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
   BIT(24));
-
-   for (i = 0; i < sg_dma_len(sgl) >> V3D_MMU_PAGE_SHIFT; i++)
+   for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
v3d->pt[page++] = pte + i;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 08/32] drm: i915: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

This driver creatively uses sg_table->orig_nents to store the size of the
allocated scatterlist and ignores the number of the entries returned by
dma_map_sg function. The sg_table->orig_nents is (mis)used to properly
free the (over)allocated scatterlist.

This patch only introduces the common DMA-mapping wrappers operating
directly on the struct sg_table objects to the dmabuf related functions,
so the other drivers, which might share buffers with i915 could rely on
the properly set nents and orig_nents values.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 11 +++
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c |  7 +++
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 2679380159fc..8a988592715b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -48,12 +48,9 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
src = sg_next(src);
}
 
-   if (!dma_map_sg_attrs(attachment->dev,
- st->sgl, st->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
-   ret = -ENOMEM;
+   ret = dma_map_sgtable(attachment->dev, st, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret)
goto err_free_sg;
-   }
 
return st;
 
@@ -73,9 +70,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
 
-   dma_unmap_sg_attrs(attachment->dev,
-  sg->sgl, sg->nents, dir,
-  DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index debaf7b18ab5..be30b27e2926 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -28,10 +28,9 @@ static struct sg_table *mock_map_dma_buf(struct 
dma_buf_attachment *attachment,
sg = sg_next(sg);
}
 
-   if (!dma_map_sg(attachment->dev, st->sgl, st->nents, dir)) {
-   err = -ENOMEM;
+   err = dma_map_sgtable(attachment->dev, st, dir, 0);
+   if (err)
goto err_st;
-   }
 
return st;
 
@@ -46,7 +45,7 @@ static void mock_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *st,
   enum dma_data_direction dir)
 {
-   dma_unmap_sg(attachment->dev, st->sgl, st->nents, dir);
+   dma_unmap_sgtable(attachment->dev, st, dir, 0);
sg_free_table(st);
kfree(st);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 29/32] rapidio: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
---
 drivers/rapidio/devices/rio_mport_cdev.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/rapidio/devices/rio_mport_cdev.c 
b/drivers/rapidio/devices/rio_mport_cdev.c
index a30342942e26..89eb3d212652 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -573,8 +573,7 @@ static void dma_req_free(struct kref *ref)
refcount);
struct mport_cdev_priv *priv = req->priv;
 
-   dma_unmap_sg(req->dmach->device->dev,
-req->sgt.sgl, req->sgt.nents, req->dir);
+   dma_unmap_sgtable(req->dmach->device->dev, >sgt, req->dir, 0);
sg_free_table(>sgt);
if (req->page_list) {
unpin_user_pages(req->page_list, req->nr_pages);
@@ -814,7 +813,6 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
struct mport_dev *md = priv->md;
struct dma_chan *chan;
int ret;
-   int nents;
 
if (xfer->length == 0)
return -EINVAL;
@@ -930,15 +928,14 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
xfer->offset, xfer->length);
}
 
-   nents = dma_map_sg(chan->device->dev,
-  req->sgt.sgl, req->sgt.nents, dir);
-   if (nents == 0) {
+   ret = dma_map_sgtable(chan->device->dev, >sgt, dir, 0);
+   if (ret) {
rmcd_error("Failed to map SG list");
ret = -EFAULT;
goto err_pg;
}
 
-   ret = do_dma_request(req, xfer, sync, nents);
+   ret = do_dma_request(req, xfer, sync, req->sgt.nents);
 
if (ret >= 0) {
if (sync == RIO_TRANSFER_ASYNC)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 01/32] drm: prime: add common helper to check scatterlist contiguity

2020-08-26 Thread Marek Szyprowski
It is a common operation done by DRM drivers to check the contiguity
of the DMA-mapped buffer described by a scatterlist in the
sg_table object. Let's add a common helper for this operation.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 23 +++--
 drivers/gpu/drm/drm_prime.c  | 31 
 include/drm/drm_prime.h  |  2 ++
 3 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index 822edeadbab3..59b9ca207b42 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -471,26 +471,9 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 {
struct drm_gem_cma_object *cma_obj;
 
-   if (sgt->nents != 1) {
-   /* check if the entries in the sg_table are contiguous */
-   dma_addr_t next_addr = sg_dma_address(sgt->sgl);
-   struct scatterlist *s;
-   unsigned int i;
-
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
-   /*
-* sg_dma_address(s) is only valid for entries
-* that have sg_dma_len(s) != 0
-*/
-   if (!sg_dma_len(s))
-   continue;
-
-   if (sg_dma_address(s) != next_addr)
-   return ERR_PTR(-EINVAL);
-
-   next_addr = sg_dma_address(s) + sg_dma_len(s);
-   }
-   }
+   /* check if the entries in the sg_table are contiguous */
+   if (drm_prime_get_contiguous_size(sgt) < attach->dmabuf->size)
+   return ERR_PTR(-EINVAL);
 
/* Create a CMA GEM buffer. */
cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1693aa7c14b5..4ed5ed1f078c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -825,6 +825,37 @@ struct sg_table *drm_prime_pages_to_sg(struct page 
**pages, unsigned int nr_page
 }
 EXPORT_SYMBOL(drm_prime_pages_to_sg);
 
+/**
+ * drm_prime_get_contiguous_size - returns the contiguous size of the buffer
+ * @sgt: sg_table describing the buffer to check
+ *
+ * This helper calculates the contiguous size in the DMA address space
+ * of the the buffer described by the provided sg_table.
+ *
+ * This is useful for implementing
+ * _gem_object_funcs.gem_prime_import_sg_table.
+ */
+unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
+{
+   dma_addr_t expected = sg_dma_address(sgt->sgl);
+   struct scatterlist *sg;
+   unsigned long size = 0;
+   int i;
+
+   for_each_sgtable_dma_sg(sgt, sg, i) {
+   unsigned int len = sg_dma_len(sg);
+
+   if (!len)
+   break;
+   if (sg_dma_address(sg) != expected)
+   break;
+   expected += len;
+   size += len;
+   }
+   return size;
+}
+EXPORT_SYMBOL(drm_prime_get_contiguous_size);
+
 /**
  * drm_gem_prime_export - helper library implementation of the export callback
  * @obj: GEM object to export
diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
index 9af7422b44cf..47ef11614627 100644
--- a/include/drm/drm_prime.h
+++ b/include/drm/drm_prime.h
@@ -92,6 +92,8 @@ struct sg_table *drm_prime_pages_to_sg(struct page **pages, 
unsigned int nr_page
 struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
 int flags);
 
+unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt);
+
 /* helper functions for importing */
 struct drm_gem_object *drm_gem_prime_import_dev(struct drm_device *dev,
struct dma_buf *dma_buf,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 07/32] drm: exynos: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
Acked-by : Inki Dae 
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 03be31427181..967a5cdc120e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -395,8 +395,8 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
return;
 
 out:
-   dma_unmap_sg(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt->sgl,
-   g2d_userptr->sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
+ DMA_BIDIRECTIONAL, 0);
 
pages = frame_vector_pages(g2d_userptr->vec);
if (!IS_ERR(pages)) {
@@ -511,10 +511,10 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
g2d_data *g2d,
 
g2d_userptr->sgt = sgt;
 
-   if (!dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, sgt->nents,
-   DMA_BIDIRECTIONAL)) {
+   ret = dma_map_sgtable(to_dma_dev(g2d->drm_dev), sgt,
+ DMA_BIDIRECTIONAL, 0);
+   if (ret) {
DRM_DEV_ERROR(g2d->dev, "failed to map sgt with dma region.\n");
-   ret = -ENOMEM;
goto err_sg_free_table;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 02/32] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-08-26 Thread Marek Szyprowski
Replace the current hand-crafted code for extracting pages and DMA
addresses from the given scatterlist by the much more robust
code based on the generic scatterlist iterators and recently
introduced sg_table-based wrappers. The resulting code is simple and
easy to understand, so the comment describing the old code is no
longer needed.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Andrzej Hajda 
---
 drivers/gpu/drm/drm_prime.c | 49 -
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 4ed5ed1f078c..5d181bf60a44 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -990,45 +990,26 @@ EXPORT_SYMBOL(drm_gem_prime_import);
 int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
 dma_addr_t *addrs, int max_entries)
 {
-   unsigned count;
-   struct scatterlist *sg;
-   struct page *page;
-   u32 page_len, page_index;
-   dma_addr_t addr;
-   u32 dma_len, dma_index;
-
-   /*
-* Scatterlist elements contains both pages and DMA addresses, but
-* one shoud not assume 1:1 relation between them. The sg->length is
-* the size of the physical memory chunk described by the sg->page,
-* while sg_dma_len(sg) is the size of the DMA (IO virtual) chunk
-* described by the sg_dma_address(sg).
-*/
-   page_index = 0;
-   dma_index = 0;
-   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
-   page_len = sg->length;
-   page = sg_page(sg);
-   dma_len = sg_dma_len(sg);
-   addr = sg_dma_address(sg);
-
-   while (pages && page_len > 0) {
-   if (WARN_ON(page_index >= max_entries))
+   struct sg_dma_page_iter dma_iter;
+   struct sg_page_iter page_iter;
+   struct page **p = pages;
+   dma_addr_t *a = addrs;
+
+   if (pages) {
+   for_each_sgtable_page(sgt, _iter, 0) {
+   if (p - pages >= max_entries)
return -1;
-   pages[page_index] = page;
-   page++;
-   page_len -= PAGE_SIZE;
-   page_index++;
+   *p++ = sg_page_iter_page(_iter);
}
-   while (addrs && dma_len > 0) {
-   if (WARN_ON(dma_index >= max_entries))
+   }
+   if (addrs) {
+   for_each_sgtable_dma_page(sgt, _iter, 0) {
+   if (a - addrs >= max_entries)
return -1;
-   addrs[dma_index] = addr;
-   addr += PAGE_SIZE;
-   dma_len -= PAGE_SIZE;
-   dma_index++;
+   *a++ = sg_page_iter_dma_address(_iter);
}
}
+
return 0;
 }
 EXPORT_SYMBOL(drm_prime_sg_to_page_addr_arrays);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 09/32] drm: lima: fix common struct sg_table related issues

2020-08-26 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Qiang Yu 
---
 drivers/gpu/drm/lima/lima_gem.c | 11 ---
 drivers/gpu/drm/lima/lima_vm.c  |  5 ++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 155f2b4b4030..11223fe348df 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -69,8 +69,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
return ret;
 
if (bo->base.sgt) {
-   dma_unmap_sg(dev, bo->base.sgt->sgl,
-bo->base.sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sgtable(dev, bo->base.sgt, DMA_BIDIRECTIONAL, 0);
sg_free_table(bo->base.sgt);
} else {
bo->base.sgt = kmalloc(sizeof(*bo->base.sgt), GFP_KERNEL);
@@ -80,7 +79,13 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
}
}
 
-   dma_map_sg(dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL);
+   ret = dma_map_sgtable(dev, , DMA_BIDIRECTIONAL, 0);
+   if (ret) {
+   sg_free_table();
+   kfree(bo->base.sgt);
+   bo->base.sgt = NULL;
+   return ret;
+   }
 
*bo->base.sgt = sgt;
 
diff --git a/drivers/gpu/drm/lima/lima_vm.c b/drivers/gpu/drm/lima/lima_vm.c
index 5b92fb82674a..2b2739adc7f5 100644
--- a/drivers/gpu/drm/lima/lima_vm.c
+++ b/drivers/gpu/drm/lima/lima_vm.c
@@ -124,7 +124,7 @@ int lima_vm_bo_add(struct lima_vm *vm, struct lima_bo *bo, 
bool create)
if (err)
goto err_out1;
 
-   for_each_sg_dma_page(bo->base.sgt->sgl, _iter, bo->base.sgt->nents, 
0) {
+   for_each_sgtable_dma_page(bo->base.sgt, _iter, 0) {
err = lima_vm_map_page(vm, sg_page_iter_dma_address(_iter),
   bo_va->node.start + offset);
if (err)
@@ -298,8 +298,7 @@ int lima_vm_map_bo(struct lima_vm *vm, struct lima_bo *bo, 
int pageoff)
mutex_lock(>lock);
 
base = bo_va->node.start + (pageoff << PAGE_SHIFT);
-   for_each_sg_dma_page(bo->base.sgt->sgl, _iter,
-bo->base.sgt->nents, pageoff) {
+   for_each_sgtable_dma_page(bo->base.sgt, _iter, pageoff) {
err = lima_vm_map_page(vm, sg_page_iter_dma_address(_iter),
   base + offset);
if (err)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 13/32] drm: omapdrm: use common helper for extracting pages array

2020-08-26 Thread Marek Szyprowski
Use common helper for converting a sg_table object into struct
page pointer array.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/omapdrm/omap_gem.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
b/drivers/gpu/drm/omapdrm/omap_gem.c
index d0d12d5dd76c..ff0c4b0c3fd0 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -1297,10 +1297,9 @@ struct drm_gem_object *omap_gem_new_dmabuf(struct 
drm_device *dev, size_t size,
omap_obj->dma_addr = sg_dma_address(sgt->sgl);
} else {
/* Create pages list from sgt */
-   struct sg_page_iter iter;
struct page **pages;
unsigned int npages;
-   unsigned int i = 0;
+   unsigned int ret;
 
npages = DIV_ROUND_UP(size, PAGE_SIZE);
pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
@@ -1311,14 +1310,9 @@ struct drm_gem_object *omap_gem_new_dmabuf(struct 
drm_device *dev, size_t size,
}
 
omap_obj->pages = pages;
-
-   for_each_sg_page(sgt->sgl, , sgt->orig_nents, 0) {
-   pages[i++] = sg_page_iter_page();
-   if (i > npages)
-   break;
-   }
-
-   if (WARN_ON(i != npages)) {
+   ret = drm_prime_sg_to_page_addr_arrays(sgt, pages, NULL,
+  npages);
+   if (WARN_ON(ret)) {
omap_gem_free_object(obj);
obj = ERR_PTR(-ENOMEM);
goto done;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 11/32] drm: mediatek: use common helper for extracting pages array

2020-08-26 Thread Marek Szyprowski
Use common helper for converting a sg_table object into struct
page pointer array.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/mediatek/mtk_drm_gem.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 3654ec732029..0583e557ad37 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -233,9 +233,7 @@ void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct mtk_drm_gem_obj *mtk_gem = to_mtk_gem_obj(obj);
struct sg_table *sgt;
-   struct sg_page_iter iter;
unsigned int npages;
-   unsigned int i = 0;
 
if (mtk_gem->kvaddr)
return mtk_gem->kvaddr;
@@ -249,11 +247,8 @@ void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
if (!mtk_gem->pages)
goto out;
 
-   for_each_sg_page(sgt->sgl, , sgt->orig_nents, 0) {
-   mtk_gem->pages[i++] = sg_page_iter_page();
-   if (i > npages)
-   break;
-   }
+   drm_prime_sg_to_page_addr_arrays(sgt, mtk_gem->pages, NULL, npages);
+
mtk_gem->kvaddr = vmap(mtk_gem->pages, npages, VM_MAP,
   pgprot_writecombine(PAGE_KERNEL));
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   6   7   8   9   >