Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-30 Thread Logan Gunthorpe


On 2022-06-30 08:56, Robin Murphy wrote:
> On 2022-06-29 23:41, Logan Gunthorpe wrote:
>>
>>
>> On 2022-06-29 13:15, Robin Murphy wrote:
>>> On 2022-06-29 16:57, Logan Gunthorpe wrote:



 On 2022-06-29 06:07, Robin Murphy wrote:
> On 2022-06-15 17:12, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in
>> finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
>>
>> A P2PDMA page may have three possible outcomes when being mapped:
>>  1) If the data path between the two devices doesn't go through
>>     the root port, then it should be mapped with a PCI bus
>> address
>>  2) If the data path goes through the host bridge, it should be
>> mapped
>>     normally with an IOMMU IOVA.
>>  3) It is not possible for the two devices to communicate and
>> thus
>>     the mapping operation should fail (and it will return
>> -EREMOTEIO).
>>
>> Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
>> indicate bus address segments. On unmap, P2PDMA segments are skipped
>> over when determining the start and end IOVA addresses.
>>
>> With this change, the flags variable in the dma_map_ops is set to
>> DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.
>>
>> Signed-off-by: Logan Gunthorpe 
>> Reviewed-by: Jason Gunthorpe 
>> ---
>>     drivers/iommu/dma-iommu.c | 68
>> +++
>>     1 file changed, 61 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f90251572a5d..b01ca0c6a7ab 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -21,6 +21,7 @@
>>     #include 
>>     #include 
>>     #include 
>> +#include 
>>     #include 
>>     #include 
>>     #include 
>> @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
>> struct scatterlist *sg, int nents,
>>     sg_dma_address(s) = DMA_MAPPING_ERROR;
>>     sg_dma_len(s) = 0;
>>     +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
>
> Logically, should we not be able to use sg_is_dma_bus_address()
> here? I
> think it should be feasible, and simpler, to prepare the p2p segments
> up-front, such that at this point all we need to do is restore the
> original length (if even that, see below).

 Per my previous email, no, because sg_is_dma_bus_address() is not set
 yet and not meant to tell you something about the page. That flag will
 be set below by pci_p2pdma_map_bus_segment() and then checkd in
 iommu_dma_unmap_sg() to determine if the dma_address in the segment
 needs to be unmapped.
>>>
>>> I know it's not set yet as-is; I'm suggesting things should be
>>> restructured so that it *would be*. In the logical design of this code,
>>> the DMA addresses are effectively determined in iommu_dma_map_sg(), and
>>> __finalise_sg() merely converts them from a relative to an absolute form
>>> (along with undoing the other trickery). Thus the call to
>>> pci_p2pdma_map_bus_segment() absolutely belongs in the main
>>> iommu_map_sg() loop.
>>
>> I don't see how that can work: __finalise_sg() does more than convert
>> them from relative to absolute, it also figures out which SG entry will
>> contain which dma_address segment. Which segment a P2PDMA address needs
>> to be programmed into depends on the how 'cur' is calculated which in
>> turn depends on things like seg_mask and max_len. This calculation is
>> not done in iommu_dma_map_sg() so I don't see how there's any hope of
>> assigning the bus address for the P2P segments in that function.
>>
>> If there's a way to restructure things so that's possible that I'm not
>> seeing, I'm open to it but it's certainly not immediately obvious.
> 
> Huh? It's still virtually the same thing; iommu_dma_map_sg() calls
> pci_p2pdma_map_bus_segment(s) and sets s->length to 0 if
> PCI_P2PDMA_MAP_BUS_ADDR, then __finalise_sg() can use
> sg_is_dma_bus_address(s) in place of is_pci_p2pdma_page(sg_page(s)), and
> just propagate the DMA address and original length from s to cur.
> 
> Here you've written a patch which looks to correctly interrupt any
> ongoing concatenation state and convey some data from the given input
> segment to the appropriate output segment, so I'm baffled by why you'd
> think you couldn't do what you've already done.

Ah, I understand now, thanks for the patience. It took me a couple of
read throughs before I got it, but I figured it out and now have a
working implementation that looks really nice. It's a big improvement
not needing the two different P2PDMA helpers.


Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-30 Thread Robin Murphy

On 2022-06-29 23:41, Logan Gunthorpe wrote:



On 2022-06-29 13:15, Robin Murphy wrote:

On 2022-06-29 16:57, Logan Gunthorpe wrote:




On 2022-06-29 06:07, Robin Murphy wrote:

On 2022-06-15 17:12, Logan Gunthorpe wrote:

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
     1) If the data path between the two devices doesn't go through
    the root port, then it should be mapped with a PCI bus address
     2) If the data path goes through the host bridge, it should be
mapped
    normally with an IOMMU IOVA.
     3) It is not possible for the two devices to communicate and thus
    the mapping operation should fail (and it will return
-EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
    drivers/iommu/dma-iommu.c | 68
+++
    1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..b01ca0c6a7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,6 +21,7 @@
    #include 
    #include 
    #include 
+#include 
    #include 
    #include 
    #include 
@@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
struct scatterlist *sg, int nents,
    sg_dma_address(s) = DMA_MAPPING_ERROR;
    sg_dma_len(s) = 0;
    +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {


Logically, should we not be able to use sg_is_dma_bus_address() here? I
think it should be feasible, and simpler, to prepare the p2p segments
up-front, such that at this point all we need to do is restore the
original length (if even that, see below).


Per my previous email, no, because sg_is_dma_bus_address() is not set
yet and not meant to tell you something about the page. That flag will
be set below by pci_p2pdma_map_bus_segment() and then checkd in
iommu_dma_unmap_sg() to determine if the dma_address in the segment
needs to be unmapped.


I know it's not set yet as-is; I'm suggesting things should be
restructured so that it *would be*. In the logical design of this code,
the DMA addresses are effectively determined in iommu_dma_map_sg(), and
__finalise_sg() merely converts them from a relative to an absolute form
(along with undoing the other trickery). Thus the call to
pci_p2pdma_map_bus_segment() absolutely belongs in the main
iommu_map_sg() loop.


I don't see how that can work: __finalise_sg() does more than convert
them from relative to absolute, it also figures out which SG entry will
contain which dma_address segment. Which segment a P2PDMA address needs
to be programmed into depends on the how 'cur' is calculated which in
turn depends on things like seg_mask and max_len. This calculation is
not done in iommu_dma_map_sg() so I don't see how there's any hope of
assigning the bus address for the P2P segments in that function.

If there's a way to restructure things so that's possible that I'm not
seeing, I'm open to it but it's certainly not immediately obvious.


Huh? It's still virtually the same thing; iommu_dma_map_sg() calls 
pci_p2pdma_map_bus_segment(s) and sets s->length to 0 if 
PCI_P2PDMA_MAP_BUS_ADDR, then __finalise_sg() can use 
sg_is_dma_bus_address(s) in place of is_pci_p2pdma_page(sg_page(s)), and 
just propagate the DMA address and original length from s to cur.


Here you've written a patch which looks to correctly interrupt any 
ongoing concatenation state and convey some data from the given input 
segment to the appropriate output segment, so I'm baffled by why you'd 
think you couldn't do what you've already done.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-29 Thread Logan Gunthorpe


On 2022-06-29 13:15, Robin Murphy wrote:
> On 2022-06-29 16:57, Logan Gunthorpe wrote:
>>
>>
>>
>> On 2022-06-29 06:07, Robin Murphy wrote:
>>> On 2022-06-15 17:12, Logan Gunthorpe wrote:
 When a PCI P2PDMA page is seen, set the IOVA length of the segment
 to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
 apply the appropriate bus address to the segment. The IOVA is not
 created if the scatterlist only consists of P2PDMA pages.

 A P2PDMA page may have three possible outcomes when being mapped:
     1) If the data path between the two devices doesn't go through
    the root port, then it should be mapped with a PCI bus address
     2) If the data path goes through the host bridge, it should be
 mapped
    normally with an IOMMU IOVA.
     3) It is not possible for the two devices to communicate and thus
    the mapping operation should fail (and it will return
 -EREMOTEIO).

 Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
 indicate bus address segments. On unmap, P2PDMA segments are skipped
 over when determining the start and end IOVA addresses.

 With this change, the flags variable in the dma_map_ops is set to
 DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

 Signed-off-by: Logan Gunthorpe 
 Reviewed-by: Jason Gunthorpe 
 ---
    drivers/iommu/dma-iommu.c | 68
 +++
    1 file changed, 61 insertions(+), 7 deletions(-)

 diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
 index f90251572a5d..b01ca0c6a7ab 100644
 --- a/drivers/iommu/dma-iommu.c
 +++ b/drivers/iommu/dma-iommu.c
 @@ -21,6 +21,7 @@
    #include 
    #include 
    #include 
 +#include 
    #include 
    #include 
    #include 
 @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
 struct scatterlist *sg, int nents,
    sg_dma_address(s) = DMA_MAPPING_ERROR;
    sg_dma_len(s) = 0;
    +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
>>>
>>> Logically, should we not be able to use sg_is_dma_bus_address() here? I
>>> think it should be feasible, and simpler, to prepare the p2p segments
>>> up-front, such that at this point all we need to do is restore the
>>> original length (if even that, see below).
>>
>> Per my previous email, no, because sg_is_dma_bus_address() is not set
>> yet and not meant to tell you something about the page. That flag will
>> be set below by pci_p2pdma_map_bus_segment() and then checkd in
>> iommu_dma_unmap_sg() to determine if the dma_address in the segment
>> needs to be unmapped.
> 
> I know it's not set yet as-is; I'm suggesting things should be
> restructured so that it *would be*. In the logical design of this code,
> the DMA addresses are effectively determined in iommu_dma_map_sg(), and
> __finalise_sg() merely converts them from a relative to an absolute form
> (along with undoing the other trickery). Thus the call to
> pci_p2pdma_map_bus_segment() absolutely belongs in the main
> iommu_map_sg() loop.

I don't see how that can work: __finalise_sg() does more than convert
them from relative to absolute, it also figures out which SG entry will
contain which dma_address segment. Which segment a P2PDMA address needs
to be programmed into depends on the how 'cur' is calculated which in
turn depends on things like seg_mask and max_len. This calculation is
not done in iommu_dma_map_sg() so I don't see how there's any hope of
assigning the bus address for the P2P segments in that function.

If there's a way to restructure things so that's possible that I'm not
seeing, I'm open to it but it's certainly not immediately obvious.

 +
 +    switch (map_type) {
 +    case PCI_P2PDMA_MAP_BUS_ADDR:
 +    /*
 + * A zero length will be ignored by
 + * iommu_map_sg() and then can be detected
>>>
>>> If that is required behaviour then it needs an explicit check in
>>> iommu_map_sg() to guarantee (and document) it. It's only by chance that
>>> __iommu_map() happens to return success for size == 0 *if* all the other
>>> arguments still line up, which is a far cry from a safe no-op.
>>
>> What should such a check look like? I could certainly add some comments
>> to iommu_map_sg(), but I don't see what the code would need to check
>> for...
> 
> I'd say a check for zero-length segments would look like "if (sg->length
> == 0)", most likely with a "continue;" on the following line.

Oh, that's pretty simple to add. Will change.

>>> However, rather than play yet more silly tricks, I think it would make
>>> even more sense to make iommu_map_sg() properly aware and able to skip
>>> direct p2p segments on its own. Once it becomes normal to pass mixed
>>> scatterlists around, it's only a matter of time until 

Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-29 Thread Robin Murphy

On 2022-06-29 16:57, Logan Gunthorpe wrote:




On 2022-06-29 06:07, Robin Murphy wrote:

On 2022-06-15 17:12, Logan Gunthorpe wrote:

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
    1) If the data path between the two devices doesn't go through
   the root port, then it should be mapped with a PCI bus address
    2) If the data path goes through the host bridge, it should be mapped
   normally with an IOMMU IOVA.
    3) It is not possible for the two devices to communicate and thus
   the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
   drivers/iommu/dma-iommu.c | 68 +++
   1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..b01ca0c6a7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,6 +21,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
struct scatterlist *sg, int nents,
   sg_dma_address(s) = DMA_MAPPING_ERROR;
   sg_dma_len(s) = 0;
   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {


Logically, should we not be able to use sg_is_dma_bus_address() here? I
think it should be feasible, and simpler, to prepare the p2p segments
up-front, such that at this point all we need to do is restore the
original length (if even that, see below).


Per my previous email, no, because sg_is_dma_bus_address() is not set
yet and not meant to tell you something about the page. That flag will
be set below by pci_p2pdma_map_bus_segment() and then checkd in
iommu_dma_unmap_sg() to determine if the dma_address in the segment
needs to be unmapped.


I know it's not set yet as-is; I'm suggesting things should be 
restructured so that it *would be*. In the logical design of this code, 
the DMA addresses are effectively determined in iommu_dma_map_sg(), and 
__finalise_sg() merely converts them from a relative to an absolute form 
(along with undoing the other trickery). Thus the call to 
pci_p2pdma_map_bus_segment() absolutely belongs in the main 
iommu_map_sg() loop.



+    if (i > 0)
+    cur = sg_next(cur);
+
+    pci_p2pdma_map_bus_segment(s, cur);
+    count++;
+    cur_len = 0;
+    continue;
+    }
+
   /*
    * Now fill in the real DMA data. If...
    * - there is a valid output segment to append to
@@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev,
struct scatterlist *sg,
   struct iova_domain *iovad = >iovad;
   struct scatterlist *s, *prev = NULL;
   int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+    struct dev_pagemap *pgmap = NULL;
+    enum pci_p2pdma_map_type map_type;
   dma_addr_t iova;
   size_t iova_len = 0;
   unsigned long mask = dma_get_seg_boundary(dev);
@@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev,
struct scatterlist *sg,
   s_length = iova_align(iovad, s_length + s_iova_off);
   s->length = s_length;
   +    if (is_pci_p2pdma_page(sg_page(s))) {
+    if (sg_page(s)->pgmap != pgmap) {
+    pgmap = sg_page(s)->pgmap;
+    map_type = pci_p2pdma_map_type(pgmap, dev);
+    }


There's a definite code smell here, but per above and below I think we
*should* actually call the new helper instead of copy-pasting half of it.






+
+    switch (map_type) {
+    case PCI_P2PDMA_MAP_BUS_ADDR:
+    /*
+ * A zero length will be ignored by
+ * iommu_map_sg() and then can be detected


If that is required behaviour then it needs an explicit check in
iommu_map_sg() to guarantee (and document) it. It's only by chance that
__iommu_map() happens to return success for size == 0 *if* all the other
arguments still line up, which is a far cry from a safe no-op.


What should such a check look like? I could certainly add some comments
to iommu_map_sg(), but I don't see what the code would need to check for...


I'd say a check for zero-length segments would look like "if (sg->length 
== 0)", most likely with a "continue;" on the following line.



However, 

Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-29 Thread Logan Gunthorpe



On 2022-06-29 06:07, Robin Murphy wrote:
> On 2022-06-15 17:12, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
>>
>> A P2PDMA page may have three possible outcomes when being mapped:
>>    1) If the data path between the two devices doesn't go through
>>   the root port, then it should be mapped with a PCI bus address
>>    2) If the data path goes through the host bridge, it should be mapped
>>   normally with an IOMMU IOVA.
>>    3) It is not possible for the two devices to communicate and thus
>>   the mapping operation should fail (and it will return -EREMOTEIO).
>>
>> Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
>> indicate bus address segments. On unmap, P2PDMA segments are skipped
>> over when determining the start and end IOVA addresses.
>>
>> With this change, the flags variable in the dma_map_ops is set to
>> DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.
>>
>> Signed-off-by: Logan Gunthorpe 
>> Reviewed-by: Jason Gunthorpe 
>> ---
>>   drivers/iommu/dma-iommu.c | 68 +++
>>   1 file changed, 61 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f90251572a5d..b01ca0c6a7ab 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -21,6 +21,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>   #include 
>> @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev,
>> struct scatterlist *sg, int nents,
>>   sg_dma_address(s) = DMA_MAPPING_ERROR;
>>   sg_dma_len(s) = 0;
>>   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
> 
> Logically, should we not be able to use sg_is_dma_bus_address() here? I
> think it should be feasible, and simpler, to prepare the p2p segments
> up-front, such that at this point all we need to do is restore the
> original length (if even that, see below).

Per my previous email, no, because sg_is_dma_bus_address() is not set
yet and not meant to tell you something about the page. That flag will
be set below by pci_p2pdma_map_bus_segment() and then checkd in
iommu_dma_unmap_sg() to determine if the dma_address in the segment
needs to be unmapped.

> 
>> +    if (i > 0)
>> +    cur = sg_next(cur);
>> +
>> +    pci_p2pdma_map_bus_segment(s, cur);
>> +    count++;
>> +    cur_len = 0;
>> +    continue;
>> +    }
>> +
>>   /*
>>    * Now fill in the real DMA data. If...
>>    * - there is a valid output segment to append to
>> @@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   struct iova_domain *iovad = >iovad;
>>   struct scatterlist *s, *prev = NULL;
>>   int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
>> +    struct dev_pagemap *pgmap = NULL;
>> +    enum pci_p2pdma_map_type map_type;
>>   dma_addr_t iova;
>>   size_t iova_len = 0;
>>   unsigned long mask = dma_get_seg_boundary(dev);
>> @@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   s_length = iova_align(iovad, s_length + s_iova_off);
>>   s->length = s_length;
>>   +    if (is_pci_p2pdma_page(sg_page(s))) {
>> +    if (sg_page(s)->pgmap != pgmap) {
>> +    pgmap = sg_page(s)->pgmap;
>> +    map_type = pci_p2pdma_map_type(pgmap, dev);
>> +    }
> 
> There's a definite code smell here, but per above and below I think we
> *should* actually call the new helper instead of copy-pasting half of it.


> 
>> +
>> +    switch (map_type) {
>> +    case PCI_P2PDMA_MAP_BUS_ADDR:
>> +    /*
>> + * A zero length will be ignored by
>> + * iommu_map_sg() and then can be detected
> 
> If that is required behaviour then it needs an explicit check in
> iommu_map_sg() to guarantee (and document) it. It's only by chance that
> __iommu_map() happens to return success for size == 0 *if* all the other
> arguments still line up, which is a far cry from a safe no-op.

What should such a check look like? I could certainly add some comments
to iommu_map_sg(), but I don't see what the code would need to check for...

> However, rather than play yet more silly tricks, I think it would make
> even more sense to make iommu_map_sg() properly aware and able to skip
> direct p2p segments on its own. Once it becomes normal to pass mixed
> scatterlists around, it's only a matter of time until one ends up being
> handed to a driver which manages its own IOMMU domain, and then what?

I suppose we can add another call 

Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-29 Thread Robin Murphy

On 2022-06-15 17:12, Logan Gunthorpe wrote:

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
   1) If the data path between the two devices doesn't go through
  the root port, then it should be mapped with a PCI bus address
   2) If the data path goes through the host bridge, it should be mapped
  normally with an IOMMU IOVA.
   3) It is not possible for the two devices to communicate and thus
  the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 68 +++
  1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..b01ca0c6a7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
  
+		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {


Logically, should we not be able to use sg_is_dma_bus_address() here? I 
think it should be feasible, and simpler, to prepare the p2p segments 
up-front, such that at this point all we need to do is restore the 
original length (if even that, see below).



+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
  
+		if (is_pci_p2pdma_page(sg_page(s))) {

+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev);
+   }


There's a definite code smell here, but per above and below I think we 
*should* actually call the new helper instead of copy-pasting half of it.



+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected


If that is required behaviour then it needs an explicit check in 
iommu_map_sg() to guarantee (and document) it. It's only by chance that 
__iommu_map() happens to return success for size == 0 *if* all the other 
arguments still line up, which is a far cry from a safe no-op.


However, rather than play yet more silly tricks, I think it would make 
even more sense to make iommu_map_sg() properly aware and able to skip 
direct p2p segments on its own. Once it becomes normal to pass mixed 
scatterlists around, it's only a matter of time until one ends up being 
handed to a driver which manages its own IOMMU domain, and then what?



+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Mapping through host bridge should be
+* mapped with regular IOVAs, thus we
+* do 

[PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2022-06-15 Thread Logan Gunthorpe
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through
 the root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
 normally with an IOMMU IOVA.
  3) It is not possible for the two devices to communicate and thus
 the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 drivers/iommu/dma-iommu.c | 68 +++
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..b01ca0c6a7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
+   if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
 
+   if (is_pci_p2pdma_page(sg_page(s))) {
+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev);
+   }
+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected
+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Mapping through host bridge should be
+* mapped with regular IOVAs, thus we
+* do nothing here and continue below.
+*/
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1215,6 +1257,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
+   if (!iova_len)
+   return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova) {
ret = -ENOMEM;
@@ -1236,7 +1281,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 out_restore_sg:
__invalidate_sg(sg, nents);
 out:
-   if (ret != -ENOMEM)
+   if (ret != -ENOMEM && ret != -EREMOTEIO)
return -EINVAL;
return