Re: [PATCH 8/8] iommu: Move default domain allocation to iommu_group_get_for_dev()

2015-10-30 Thread Joerg Roedel
Hi Will,

On Thu, Oct 29, 2015 at 06:22:49PM +, Will Deacon wrote:
> The call to iommu_group_get_for_dev in arm_smmu_add_device will end up
> calling __iommu_attach_device, since group->domain will now be initialised
> by the code above. This means the SMMU driver will see an ->attach_dev
> call for a device that is part-way through an ->add_device callback and
> will be missing the initialisation necessary for us to idenfity the SMMU
> instance to which is corresponds. In fact, the iommudata for the group
> won't be initialised at all, so the whole thing will fail afaict.
> 
> Note that I haven't actually taken this for a spin, so I could be missing
> something.

Yeah, I havn't looked at how to convert the ARM-SMMU drivers to default
domains yet, so the issue you describe above is totally possible.

But there is no way to trigger it yet, because your domain_alloc
function can not yet allocate IOMMU_DOMAIN_DMA domains. While converting
the issue must be fixed, of course.

I tested this patch-set on an AMD Seattle system and it worked fine
there.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-10-30 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 09:17:52AM +0800, Daniel Kurtz wrote:
> Hmm, I thought the DMA API maps a (possibly) non-contiguous set of
> memory pages into a contiguous block in device memory address space.
> This would allow passing a dma mapped buffer to device dma using just
> a device address and length.

If you are speaking of the dma_map_sg interface, than there is absolutly
no guarantee from the API side that the buffers you pass in will end up
mapped contiguously.
IOMMU drivers handle this differently, and when there is no IOMMU at all
there is also no way to map these buffers together.

> So, is the videobuf2-dma-contig.c based on an incorrect assumption
> about how the DMA API is supposed to work?

If it makes the above assumption, then yes.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-10-30 Thread Robin Murphy

Hi Dan,

On 30/10/15 01:17, Daniel Kurtz wrote:

+linux-media & VIDEOBUF2 FRAMEWORK maintainers since this is about the
v4l2-contig's usage of the DMA API.

Hi Robin,

On Tue, Oct 27, 2015 at 12:55 AM, Robin Murphy  wrote:

On 26/10/15 13:44, Yong Wu wrote:


On Thu, 2015-10-01 at 20:13 +0100, Robin Murphy wrote:
[...]


+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+   int nents, int prot)
+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova *iova;
+   struct scatterlist *s, *prev = NULL;
+   dma_addr_t dma_addr;
+   size_t iova_len = 0;
+   int i;
+
+   /*
+* Work out how much IOVA space we need, and align the segments
to
+* IOVA granules for the IOMMU driver to handle. With some clever
+* trickery we can modify the list in-place, but reversibly, by
+* hiding the original data in the as-yet-unused DMA fields.
+*/
+   for_each_sg(sg, s, nents, i) {
+   size_t s_offset = iova_offset(iovad, s->offset);
+   size_t s_length = s->length;
+
+   sg_dma_address(s) = s->offset;
+   sg_dma_len(s) = s_length;
+   s->offset -= s_offset;
+   s_length = iova_align(iovad, s_length + s_offset);
+   s->length = s_length;
+
+   /*
+* The simple way to avoid the rare case of a segment
+* crossing the boundary mask is to pad the previous one
+* to end at a naturally-aligned IOVA for this one's
size,
+* at the cost of potentially over-allocating a little.
+*/
+   if (prev) {
+   size_t pad_len = roundup_pow_of_two(s_length);
+
+   pad_len = (pad_len - iova_len) & (pad_len - 1);
+   prev->length += pad_len;



Hi Robin,
While our v4l2 testing, It seems that we met a problem here.
Here we update prev->length again, Do we need update
sg_dma_len(prev) again too?

Some function like vb2_dc_get_contiguous_size[1] always get
sg_dma_len(s) to compare instead of s->length. so it may break
unexpectedly while sg_dma_len(s) is not same with s->length.



This is just tweaking the faked-up length that we hand off to iommu_map_sg()
(see also the iova_align() above), to trick it into bumping this segment up
to a suitable starting IOVA. The real length at this point is stashed in
sg_dma_len(s), and will be copied back into s->length in __finalise_sg(), so
both will hold the same true length once we return to the caller.

Yes, it does mean that if you have a list where the segment lengths are page
aligned but not monotonically decreasing, e.g. {64k, 16k, 64k}, then you'll
still end up with a gap between the second and third segments, but that's
fine because the DMA API offers no guarantees about what the resulting DMA
addresses will be (consider the no-IOMMU case where they would each just be
"mapped" to their physical address). If that breaks v4l, then it's probably
v4l's DMA API use that needs looking at (again).


Hmm, I thought the DMA API maps a (possibly) non-contiguous set of
memory pages into a contiguous block in device memory address space.
This would allow passing a dma mapped buffer to device dma using just
a device address and length.


Not at all. The streaming DMA API (dma_map_* and friends) has two 
responsibilities: performing any necessary cache maintenance to ensure 
the device will correctly see data from the CPU, and the CPU will 
correctly see data from the device; and working out an address for that 
buffer from the device's point of view to actually hand off to the 
hardware (which is perfectly well allowed to fail).


Consider SWIOTLB's implementation - segments which already lie at 
physical addresses within the device's DMA mask just get passed through, 
while those that lie outside it get mapped into the bounce buffer, but 
still as individual allocations (arch code just handles cache 
maintenance on the resulting physical addresses and can apply any 
hard-wired DMA offset for the device concerned).



IIUC, the change above breaks this model by inserting gaps in how the
buffer is mapped to device memory, such that the buffer is no longer
contiguous in dma address space.


Even the existing arch/arm IOMMU DMA code which I guess this implicitly 
relies on doesn't guarantee that behaviour - if the mapping happens to 
reach one of the segment length/boundary limits it won't just 

Re: [PATCH v6 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-10-30 Thread Mark Hounschell

On 10/30/2015 10:09 AM, Joerg Roedel wrote:

On Fri, Oct 30, 2015 at 09:17:52AM +0800, Daniel Kurtz wrote:

Hmm, I thought the DMA API maps a (possibly) non-contiguous set of
memory pages into a contiguous block in device memory address space.
This would allow passing a dma mapped buffer to device dma using just
a device address and length.


If you are speaking of the dma_map_sg interface, than there is absolutly
no guarantee from the API side that the buffers you pass in will end up
mapped contiguously.
IOMMU drivers handle this differently, and when there is no IOMMU at all
there is also no way to map these buffers together.



That is what CMA is for ya know. It makes it physically contiguous.

Mark
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu