Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-07 Thread Alexey Kardashevskiy




On 04/09/2020 16:04, Leonardo Bras wrote:

On Thu, 2020-09-03 at 14:41 +1000, Alexey Kardashevskiy wrote:

I am new to this, so I am trying to understand how a memory page mapped

as DMA, and used for something else could be a problem.


  From the device prospective, there is PCI space and everything from 0
till 1<<64 is accessible and what is that mapped to - the device does
not know. PHB's IOMMU is the thing to notice invalid access and raise
EEH but PHB only knows about PCI->physical memory mapping (with IOMMU
pages) but nothing about the host kernel pages. Does this help? Thanks,


According to our conversation on Slack:
1- There is a problem if a hypervisor gives to it's VMs contiguous
memory blocks that are not aligned to IOMMU pages, because then an
iommu_map_page() could map some memory in this VM and some memory in
other VM / process.
2- To guarantee this, we should have system pagesize >= iommu_pagesize

One way to get (2) is by doing this in enable_ddw():
if ((query.page_size & 4) && PAGE_SHIFT >= 24) {


You won't ever (well, soon) see PAGE_SHIFT==24, it is either 4K or 64K. 
However 16MB IOMMU pages is fine - if hypervisor uses huge pages for VMs 
RAM, it also then advertises huge IOMMU pages in ddw-query. So for the 
1:1 case there must be no "PAGE_SHIFT >= 24".




page_shift = 24; /* 16MB */
} else if ((query.page_size & 2) &&  PAGE_SHIFT >= 16 ) {
page_shift = 16; /* 64kB */
} else if (query.page_size & 1 &&  PAGE_SHIFT >= 12) {
page_shift = 12; /* 4kB */
[...]

Another way of solving this, would be adding in LoPAR documentation
that the blocksize of contiguous memory the hypervisor gives a VM
should always be aligned to IOMMU pagesize offered.


I think this is assumed already by the design of the DDW API.



I think the best approach would be first sending the above patch, which
is faster, and then get working into adding that to documentation, so
hypervisors guarantee this.

If this gets into the docs, we can revert the patch.

What do you think?
I think we diverted from the original patch :) I am not quite sure what 
you were fixing there. Thanks,



--
Alexey


Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-04 Thread Leonardo Bras
On Thu, 2020-09-03 at 14:41 +1000, Alexey Kardashevskiy wrote:
> I am new to this, so I am trying to understand how a memory page mapped
> > as DMA, and used for something else could be a problem.
> 
>  From the device prospective, there is PCI space and everything from 0 
> till 1<<64 is accessible and what is that mapped to - the device does 
> not know. PHB's IOMMU is the thing to notice invalid access and raise 
> EEH but PHB only knows about PCI->physical memory mapping (with IOMMU 
> pages) but nothing about the host kernel pages. Does this help? Thanks,

According to our conversation on Slack:
1- There is a problem if a hypervisor gives to it's VMs contiguous
memory blocks that are not aligned to IOMMU pages, because then an 
iommu_map_page() could map some memory in this VM and some memory in
other VM / process.
2- To guarantee this, we should have system pagesize >= iommu_pagesize 

One way to get (2) is by doing this in enable_ddw():
if ((query.page_size & 4) && PAGE_SHIFT >= 24) {
page_shift = 24; /* 16MB */
} else if ((query.page_size & 2) &&  PAGE_SHIFT >= 16 ) {
page_shift = 16; /* 64kB */
} else if (query.page_size & 1 &&  PAGE_SHIFT >= 12) {
page_shift = 12; /* 4kB */
[...]

Another way of solving this, would be adding in LoPAR documentation
that the blocksize of contiguous memory the hypervisor gives a VM
should always be aligned to IOMMU pagesize offered.

I think the best approach would be first sending the above patch, which
is faster, and then get working into adding that to documentation, so
hypervisors guarantee this.

If this gets into the docs, we can revert the patch.

What do you think?

Best regards!



Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-02 Thread Alexey Kardashevskiy




On 02/09/2020 08:34, Leonardo Bras wrote:

On Mon, 2020-08-31 at 10:47 +1000, Alexey Kardashevskiy wrote:


Maybe testing with host 64k pagesize and IOMMU 16MB pagesize in qemu
should be enough, is there any chance to get indirect mapping in qemu
like this? (DDW but with smaller DMA window available)


You will have to hack the guest kernel to always do indirect mapping or
hack QEMU's rtas_ibm_query_pe_dma_window() to return a small number of
available TCEs. But you will be testing QEMU/KVM which behave quite
differently to pHyp in this particular case.



As you suggested before, building for 4k cpu pagesize should be the
best approach. It would allow testing for both pHyp and qemu scenarios.


Because if we want the former (==support), then we'll have to align the
size up to the bigger page size when allocating/zeroing system pages,
etc.


This part I don't understand. Why do we need to align everything to the
bigger pagesize?

I mean, is not that enough that the range [ret, ret + size[ is both
allocated by mm and mapped on a iommu range?

Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
IOMMU_PAGE_SIZE() == 64k.
Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough?
All the space the user asked for is allocated and mapped for DMA.


The user asked to map 16K, the rest - 48K - is used for something else
(may be even mapped to another device) but you are making all 64K
accessible by the device which only should be able to access 16K.

In practice, if this happens, H_PUT_TCE will simply fail.


I have noticed mlx5 driver getting a few bytes in a buffer, and using
iommu_map_page(). It does map a whole page for as few bytes as the user


Whole 4K system page or whole 64K iommu page?


I tested it in 64k system page + 64k iommu page.

The 64K system page may be used for anything, and a small portion of it
(say 128 bytes) needs to be used for DMA.
The whole page is mapped by IOMMU, and the driver gets info of the
memory range it should access / modify.



This works because the whole system page belongs to the same memory 
context and IOMMU allows a device to access that page. You can still 
have problems if there is a bug within the page but it will go mostly 
unnoticed as it will be memory corruption.


If you system page is smaller (4K) than IOMMU page (64K), then the 
device gets wider access than it should but it is still going to be 
silent memory corruption.








wants mapped, and the other bytes get used for something else, or just
mapped on another DMA page.
It seems to work fine.



With 4K system page and 64K IOMMU page? In practice it would take an
effort or/and bad luck to see it crashing. Thanks,


I haven't tested it yet. On a 64k system page and 4k/64k iommu page, it
works as described above.

I am new to this, so I am trying to understand how a memory page mapped
as DMA, and used for something else could be a problem.


From the device prospective, there is PCI space and everything from 0 
till 1<<64 is accessible and what is that mapped to - the device does 
not know. PHB's IOMMU is the thing to notice invalid access and raise 
EEH but PHB only knows about PCI->physical memory mapping (with IOMMU 
pages) but nothing about the host kernel pages. Does this help? Thanks,





Thanks!






Bigger pages are not the case here as I understand it.


I did not get this part, what do you mean?


Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
supported set of sizes is different for P8/P9 and type of IO (PHB,
NVLink/CAPI).



Update those functions to guarantee alignment with requested size
using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().

Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.

Signed-off-by: Leonardo Bras 
---
  arch/powerpc/kernel/iommu.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..d7086087830f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device *dev,
}
  
  	if (dev)

-   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
- 1 << tbl->it_page_shift);
+   boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
tbl);


Run checkpatch.pl, should complain about a long line.


It's 86 columns long, which is less than the new limit of 100 columns
Linus announced a few weeks ago. checkpatch.pl was updated too:
https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col


Yay finally :) Thanks,


:)




else
-   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
+   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
  
  	n = 

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-09-01 Thread Leonardo Bras
On Mon, 2020-08-31 at 10:47 +1000, Alexey Kardashevskiy wrote:
> > 
> > Maybe testing with host 64k pagesize and IOMMU 16MB pagesize in qemu
> > should be enough, is there any chance to get indirect mapping in qemu
> > like this? (DDW but with smaller DMA window available) 
> 
> You will have to hack the guest kernel to always do indirect mapping or
> hack QEMU's rtas_ibm_query_pe_dma_window() to return a small number of
> available TCEs. But you will be testing QEMU/KVM which behave quite
> differently to pHyp in this particular case.
> 

As you suggested before, building for 4k cpu pagesize should be the
best approach. It would allow testing for both pHyp and qemu scenarios.

> > > > > Because if we want the former (==support), then we'll have to align 
> > > > > the
> > > > > size up to the bigger page size when allocating/zeroing system pages,
> > > > > etc. 
> > > > 
> > > > This part I don't understand. Why do we need to align everything to the
> > > > bigger pagesize? 
> > > > 
> > > > I mean, is not that enough that the range [ret, ret + size[ is both
> > > > allocated by mm and mapped on a iommu range?
> > > > 
> > > > Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
> > > > IOMMU_PAGE_SIZE() == 64k.
> > > > Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
> > > > All the space the user asked for is allocated and mapped for DMA.
> > > 
> > > The user asked to map 16K, the rest - 48K - is used for something else
> > > (may be even mapped to another device) but you are making all 64K
> > > accessible by the device which only should be able to access 16K.
> > > 
> > > In practice, if this happens, H_PUT_TCE will simply fail.
> > 
> > I have noticed mlx5 driver getting a few bytes in a buffer, and using
> > iommu_map_page(). It does map a whole page for as few bytes as the user
> 
> Whole 4K system page or whole 64K iommu page?

I tested it in 64k system page + 64k iommu page.

The 64K system page may be used for anything, and a small portion of it
(say 128 bytes) needs to be used for DMA. 
The whole page is mapped by IOMMU, and the driver gets info of the
memory range it should access / modify.

> 
> > wants mapped, and the other bytes get used for something else, or just
> > mapped on another DMA page.
> > It seems to work fine.  
> 
> 
> With 4K system page and 64K IOMMU page? In practice it would take an
> effort or/and bad luck to see it crashing. Thanks,

I haven't tested it yet. On a 64k system page and 4k/64k iommu page, it
works as described above.

I am new to this, so I am trying to understand how a memory page mapped
as DMA, and used for something else could be a problem.

Thanks!

> 
> > > 
> > > > > Bigger pages are not the case here as I understand it.
> > > > 
> > > > I did not get this part, what do you mean?
> > > 
> > > Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
> > > supported set of sizes is different for P8/P9 and type of IO (PHB,
> > > NVLink/CAPI).
> > > 
> > > 
> > > > > > Update those functions to guarantee alignment with requested size
> > > > > > using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
> > > > > > 
> > > > > > Also, on iommu_range_alloc(), replace ALIGN(n, 1 << 
> > > > > > tbl->it_page_shift)
> > > > > > with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
> > > > > > 
> > > > > > Signed-off-by: Leonardo Bras 
> > > > > > ---
> > > > > >  arch/powerpc/kernel/iommu.c | 17 +
> > > > > >  1 file changed, 9 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/powerpc/kernel/iommu.c 
> > > > > > b/arch/powerpc/kernel/iommu.c
> > > > > > index 9704f3f76e63..d7086087830f 100644
> > > > > > --- a/arch/powerpc/kernel/iommu.c
> > > > > > +++ b/arch/powerpc/kernel/iommu.c
> > > > > > @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct 
> > > > > > device *dev,
> > > > > > }
> > > > > >  
> > > > > > if (dev)
> > > > > > -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > > > > > - 1 << tbl->it_page_shift);
> > > > > > +   boundary_size = 
> > > > > > IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
> > > > > 
> > > > > Run checkpatch.pl, should complain about a long line.
> > > > 
> > > > It's 86 columns long, which is less than the new limit of 100 columns
> > > > Linus announced a few weeks ago. checkpatch.pl was updated too:
> > > > https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col
> > > 
> > > Yay finally :) Thanks,
> > 
> > :)
> > 
> > > 
> > > > > > else
> > > > > > -   boundary_size = ALIGN(1UL << 32, 1 << 
> > > > > > tbl->it_page_shift);
> > > > > > +   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
> > > > > > /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> > > > > >  
> > > > > > n = iommu_area_alloc(tbl->it_map, limit, start, npages, 
> > > > > > tbl->it_offset,
> > > > > > @@ -858,6 +857,7 @@ void 

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-30 Thread Alexey Kardashevskiy



On 29/08/2020 06:41, Leonardo Bras wrote:
> On Fri, 2020-08-28 at 11:40 +1000, Alexey Kardashevskiy wrote:
>>> I think it would be better to keep the code as much generic as possible
>>> regarding page sizes. 
>>
>> Then you need to test it. Does 4K guest even boot (it should but I would
>> not bet much on it)?
> 
> Maybe testing with host 64k pagesize and IOMMU 16MB pagesize in qemu
> should be enough, is there any chance to get indirect mapping in qemu
> like this? (DDW but with smaller DMA window available) 


You will have to hack the guest kernel to always do indirect mapping or
hack QEMU's rtas_ibm_query_pe_dma_window() to return a small number of
available TCEs. But you will be testing QEMU/KVM which behave quite
differently to pHyp in this particular case.



 Because if we want the former (==support), then we'll have to align the
 size up to the bigger page size when allocating/zeroing system pages,
 etc. 
>>>
>>> This part I don't understand. Why do we need to align everything to the
>>> bigger pagesize? 
>>>
>>> I mean, is not that enough that the range [ret, ret + size[ is both
>>> allocated by mm and mapped on a iommu range?
>>>
>>> Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
>>> IOMMU_PAGE_SIZE() == 64k.
>>> Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
>>> All the space the user asked for is allocated and mapped for DMA.
>>
>> The user asked to map 16K, the rest - 48K - is used for something else
>> (may be even mapped to another device) but you are making all 64K
>> accessible by the device which only should be able to access 16K.
>>
>> In practice, if this happens, H_PUT_TCE will simply fail.
> 
> I have noticed mlx5 driver getting a few bytes in a buffer, and using
> iommu_map_page(). It does map a whole page for as few bytes as the user


Whole 4K system page or whole 64K iommu page?

> wants mapped, and the other bytes get used for something else, or just
> mapped on another DMA page.
> It seems to work fine.  



With 4K system page and 64K IOMMU page? In practice it would take an
effort or/and bad luck to see it crashing. Thanks,



> 
>>
>>
 Bigger pages are not the case here as I understand it.
>>>
>>> I did not get this part, what do you mean?
>>
>> Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
>> supported set of sizes is different for P8/P9 and type of IO (PHB,
>> NVLink/CAPI).
>>
>>
> Update those functions to guarantee alignment with requested size
> using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
>
> Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
>
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/kernel/iommu.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..d7086087830f 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device 
> *dev,
>   }
>  
>   if (dev)
> - boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> -   1 << tbl->it_page_shift);
> + boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
> tbl);

 Run checkpatch.pl, should complain about a long line.
>>>
>>> It's 86 columns long, which is less than the new limit of 100 columns
>>> Linus announced a few weeks ago. checkpatch.pl was updated too:
>>> https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col
>>
>> Yay finally :) Thanks,
> 
> :)
> 
>>
>>

>   else
> - boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> + boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
>   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
>  
>   n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> iommu_table *tbl,
>   unsigned int order;
>   unsigned int nio_pages, io_order;
>   struct page *page;
> + size_t size_io = size;
>  
>   size = PAGE_ALIGN(size);
>   order = get_order(size);
> @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> iommu_table *tbl,
>   memset(ret, 0, size);
>  
>   /* Set up tces to cover the allocated range */
> - nio_pages = size >> tbl->it_page_shift;
> - io_order = get_iommu_order(size, tbl);
> + size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> + nio_pages = size_io >> tbl->it_page_shift;
> + io_order = get_iommu_order(size_io, tbl);
>   mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
> mask >> tbl->it_page_shift, io_order, 

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-28 Thread Leonardo Bras
On Fri, 2020-08-28 at 11:40 +1000, Alexey Kardashevskiy wrote:
> > I think it would be better to keep the code as much generic as possible
> > regarding page sizes. 
> 
> Then you need to test it. Does 4K guest even boot (it should but I would
> not bet much on it)?

Maybe testing with host 64k pagesize and IOMMU 16MB pagesize in qemu
should be enough, is there any chance to get indirect mapping in qemu
like this? (DDW but with smaller DMA window available) 

> > > Because if we want the former (==support), then we'll have to align the
> > > size up to the bigger page size when allocating/zeroing system pages,
> > > etc. 
> > 
> > This part I don't understand. Why do we need to align everything to the
> > bigger pagesize? 
> > 
> > I mean, is not that enough that the range [ret, ret + size[ is both
> > allocated by mm and mapped on a iommu range?
> > 
> > Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
> > IOMMU_PAGE_SIZE() == 64k.
> > Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
> > All the space the user asked for is allocated and mapped for DMA.
> 
> The user asked to map 16K, the rest - 48K - is used for something else
> (may be even mapped to another device) but you are making all 64K
> accessible by the device which only should be able to access 16K.
> 
> In practice, if this happens, H_PUT_TCE will simply fail.

I have noticed mlx5 driver getting a few bytes in a buffer, and using
iommu_map_page(). It does map a whole page for as few bytes as the user
wants mapped, and the other bytes get used for something else, or just
mapped on another DMA page.
It seems to work fine.  

> 
> 
> > > Bigger pages are not the case here as I understand it.
> > 
> > I did not get this part, what do you mean?
> 
> Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
> supported set of sizes is different for P8/P9 and type of IO (PHB,
> NVLink/CAPI).
> 
> 
> > > > Update those functions to guarantee alignment with requested size
> > > > using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
> > > > 
> > > > Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> > > > with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
> > > > 
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  arch/powerpc/kernel/iommu.c | 17 +
> > > >  1 file changed, 9 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > > > index 9704f3f76e63..d7086087830f 100644
> > > > --- a/arch/powerpc/kernel/iommu.c
> > > > +++ b/arch/powerpc/kernel/iommu.c
> > > > @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct 
> > > > device *dev,
> > > > }
> > > >  
> > > > if (dev)
> > > > -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > > > - 1 << tbl->it_page_shift);
> > > > +   boundary_size = 
> > > > IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
> > > 
> > > Run checkpatch.pl, should complain about a long line.
> > 
> > It's 86 columns long, which is less than the new limit of 100 columns
> > Linus announced a few weeks ago. checkpatch.pl was updated too:
> > https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col
> 
> Yay finally :) Thanks,

:)

> 
> 
> > > 
> > > > else
> > > > -   boundary_size = ALIGN(1UL << 32, 1 << 
> > > > tbl->it_page_shift);
> > > > +   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
> > > > /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> > > >  
> > > > n = iommu_area_alloc(tbl->it_map, limit, start, npages, 
> > > > tbl->it_offset,
> > > > @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, 
> > > > struct iommu_table *tbl,
> > > > unsigned int order;
> > > > unsigned int nio_pages, io_order;
> > > > struct page *page;
> > > > +   size_t size_io = size;
> > > >  
> > > > size = PAGE_ALIGN(size);
> > > > order = get_order(size);
> > > > @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, 
> > > > struct iommu_table *tbl,
> > > > memset(ret, 0, size);
> > > >  
> > > > /* Set up tces to cover the allocated range */
> > > > -   nio_pages = size >> tbl->it_page_shift;
> > > > -   io_order = get_iommu_order(size, tbl);
> > > > +   size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> > > > +   nio_pages = size_io >> tbl->it_page_shift;
> > > > +   io_order = get_iommu_order(size_io, tbl);
> > > > mapping = iommu_alloc(dev, tbl, ret, nio_pages, 
> > > > DMA_BIDIRECTIONAL,
> > > >   mask >> tbl->it_page_shift, io_order, 0);
> > > > if (mapping == DMA_MAPPING_ERROR) {
> > > > @@ -900,11 +901,11 @@ void iommu_free_coherent(struct iommu_table *tbl, 
> > > > size_t size,
> > > >  void *vaddr, dma_addr_t dma_handle)

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-27 Thread Alexey Kardashevskiy



On 28/08/2020 02:51, Leonardo Bras wrote:
> On Sat, 2020-08-22 at 20:07 +1000, Alexey Kardashevskiy wrote:
>>
>> On 18/08/2020 09:40, Leonardo Bras wrote:
>>> Both iommu_alloc_coherent() and iommu_free_coherent() assume that once
>>> size is aligned to PAGE_SIZE it will be aligned to IOMMU_PAGE_SIZE.
>>
>> The only case when it is not aligned is when IOMMU_PAGE_SIZE > PAGE_SIZE
>> which is unlikely but not impossible, we could configure the kernel for
>> 4K system pages and 64K IOMMU pages I suppose. Do we really want to do
>> this here, or simply put WARN_ON(tbl->it_page_shift > PAGE_SHIFT)?
> 
> I think it would be better to keep the code as much generic as possible
> regarding page sizes. 

Then you need to test it. Does 4K guest even boot (it should but I would
not bet much on it)?

> 
>> Because if we want the former (==support), then we'll have to align the
>> size up to the bigger page size when allocating/zeroing system pages,
>> etc. 
> 
> This part I don't understand. Why do we need to align everything to the
> bigger pagesize? 
> 
> I mean, is not that enough that the range [ret, ret + size[ is both
> allocated by mm and mapped on a iommu range?
> 
> Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
> IOMMU_PAGE_SIZE() == 64k.
> Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
> All the space the user asked for is allocated and mapped for DMA.


The user asked to map 16K, the rest - 48K - is used for something else
(may be even mapped to another device) but you are making all 64K
accessible by the device which only should be able to access 16K.

In practice, if this happens, H_PUT_TCE will simply fail.


> 
>> Bigger pages are not the case here as I understand it.
> 
> I did not get this part, what do you mean?


Possible IOMMU page sizes are 4K, 64K, 2M, 16M, 256M, 1GB, and the
supported set of sizes is different for P8/P9 and type of IO (PHB,
NVLink/CAPI).


> 
>>> Update those functions to guarantee alignment with requested size
>>> using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
>>>
>>> Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
>>> with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/kernel/iommu.c | 17 +
>>>  1 file changed, 9 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>> index 9704f3f76e63..d7086087830f 100644
>>> --- a/arch/powerpc/kernel/iommu.c
>>> +++ b/arch/powerpc/kernel/iommu.c
>>> @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device 
>>> *dev,
>>> }
>>>  
>>> if (dev)
>>> -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
>>> - 1 << tbl->it_page_shift);
>>> +   boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
>>> tbl);
>>
>> Run checkpatch.pl, should complain about a long line.
> 
> It's 86 columns long, which is less than the new limit of 100 columns
> Linus announced a few weeks ago. checkpatch.pl was updated too:
> https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col

Yay finally :) Thanks,


> 
>>
>>
>>> else
>>> -   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
>>> +   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
>>> /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
>>>  
>>> n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
>>> @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
>>> iommu_table *tbl,
>>> unsigned int order;
>>> unsigned int nio_pages, io_order;
>>> struct page *page;
>>> +   size_t size_io = size;
>>>  
>>> size = PAGE_ALIGN(size);
>>> order = get_order(size);
>>> @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
>>> iommu_table *tbl,
>>> memset(ret, 0, size);
>>>  
>>> /* Set up tces to cover the allocated range */
>>> -   nio_pages = size >> tbl->it_page_shift;
>>> -   io_order = get_iommu_order(size, tbl);
>>> +   size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
>>> +   nio_pages = size_io >> tbl->it_page_shift;
>>> +   io_order = get_iommu_order(size_io, tbl);
>>> mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
>>>   mask >> tbl->it_page_shift, io_order, 0);
>>> if (mapping == DMA_MAPPING_ERROR) {
>>> @@ -900,11 +901,11 @@ void iommu_free_coherent(struct iommu_table *tbl, 
>>> size_t size,
>>>  void *vaddr, dma_addr_t dma_handle)
>>>  {
>>> if (tbl) {
>>> -   unsigned int nio_pages;
>>> +   size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
>>> +   unsigned int nio_pages = size_io >> tbl->it_page_shift;
>>>  
>>> -   size = PAGE_ALIGN(size);
>>> -   nio_pages = size >> tbl->it_page_shift;
>>> iommu_free(tbl, dma_handle, nio_pages);
>>> +
>>

Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-27 Thread Leonardo Bras
On Sat, 2020-08-22 at 20:07 +1000, Alexey Kardashevskiy wrote:
> 
> On 18/08/2020 09:40, Leonardo Bras wrote:
> > Both iommu_alloc_coherent() and iommu_free_coherent() assume that once
> > size is aligned to PAGE_SIZE it will be aligned to IOMMU_PAGE_SIZE.
> 
> The only case when it is not aligned is when IOMMU_PAGE_SIZE > PAGE_SIZE
> which is unlikely but not impossible, we could configure the kernel for
> 4K system pages and 64K IOMMU pages I suppose. Do we really want to do
> this here, or simply put WARN_ON(tbl->it_page_shift > PAGE_SHIFT)?

I think it would be better to keep the code as much generic as possible
regarding page sizes. 

> Because if we want the former (==support), then we'll have to align the
> size up to the bigger page size when allocating/zeroing system pages,
> etc. 

This part I don't understand. Why do we need to align everything to the
bigger pagesize? 

I mean, is not that enough that the range [ret, ret + size[ is both
allocated by mm and mapped on a iommu range?

Suppose a iommu_alloc_coherent() of 16kB on PAGESIZE = 4k and
IOMMU_PAGE_SIZE() == 64k.
Why 4 * cpu_pages mapped by a 64k IOMMU page is not enough? 
All the space the user asked for is allocated and mapped for DMA.


> Bigger pages are not the case here as I understand it.

I did not get this part, what do you mean?

> > Update those functions to guarantee alignment with requested size
> > using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
> > 
> > Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> > with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
> > 
> > Signed-off-by: Leonardo Bras 
> > ---
> >  arch/powerpc/kernel/iommu.c | 17 +
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > index 9704f3f76e63..d7086087830f 100644
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device 
> > *dev,
> > }
> >  
> > if (dev)
> > -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > - 1 << tbl->it_page_shift);
> > +   boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
> > tbl);
> 
> Run checkpatch.pl, should complain about a long line.

It's 86 columns long, which is less than the new limit of 100 columns
Linus announced a few weeks ago. checkpatch.pl was updated too:
https://www.phoronix.com/scan.php?page=news_item=Linux-Kernel-Deprecates-80-Col


> 
> 
> > else
> > -   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> > +   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
> > /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
> >  
> > n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> > @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> > iommu_table *tbl,
> > unsigned int order;
> > unsigned int nio_pages, io_order;
> > struct page *page;
> > +   size_t size_io = size;
> >  
> > size = PAGE_ALIGN(size);
> > order = get_order(size);
> > @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> > iommu_table *tbl,
> > memset(ret, 0, size);
> >  
> > /* Set up tces to cover the allocated range */
> > -   nio_pages = size >> tbl->it_page_shift;
> > -   io_order = get_iommu_order(size, tbl);
> > +   size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> > +   nio_pages = size_io >> tbl->it_page_shift;
> > +   io_order = get_iommu_order(size_io, tbl);
> > mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
> >   mask >> tbl->it_page_shift, io_order, 0);
> > if (mapping == DMA_MAPPING_ERROR) {
> > @@ -900,11 +901,11 @@ void iommu_free_coherent(struct iommu_table *tbl, 
> > size_t size,
> >  void *vaddr, dma_addr_t dma_handle)
> >  {
> > if (tbl) {
> > -   unsigned int nio_pages;
> > +   size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
> > +   unsigned int nio_pages = size_io >> tbl->it_page_shift;
> >  
> > -   size = PAGE_ALIGN(size);
> > -   nio_pages = size >> tbl->it_page_shift;
> > iommu_free(tbl, dma_handle, nio_pages);
> > +
> 
> Unrelated new line.

Will be removed. Thanks!

> 
> 
> > size = PAGE_ALIGN(size);
> > free_pages((unsigned long)vaddr, get_order(size));
> > }
> > 



Re: [PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-22 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> Both iommu_alloc_coherent() and iommu_free_coherent() assume that once
> size is aligned to PAGE_SIZE it will be aligned to IOMMU_PAGE_SIZE.

The only case when it is not aligned is when IOMMU_PAGE_SIZE > PAGE_SIZE
which is unlikely but not impossible, we could configure the kernel for
4K system pages and 64K IOMMU pages I suppose. Do we really want to do
this here, or simply put WARN_ON(tbl->it_page_shift > PAGE_SHIFT)?
Because if we want the former (==support), then we'll have to align the
size up to the bigger page size when allocating/zeroing system pages,
etc. Bigger pages are not the case here as I understand it.


> 
> Update those functions to guarantee alignment with requested size
> using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().
> 
> Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/kernel/iommu.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..d7086087830f 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device 
> *dev,
>   }
>  
>   if (dev)
> - boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> -   1 << tbl->it_page_shift);
> + boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
> tbl);


Run checkpatch.pl, should complain about a long line.


>   else
> - boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> + boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
>   /* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
>  
>   n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> iommu_table *tbl,
>   unsigned int order;
>   unsigned int nio_pages, io_order;
>   struct page *page;
> + size_t size_io = size;
>  
>   size = PAGE_ALIGN(size);
>   order = get_order(size);
> @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
> iommu_table *tbl,
>   memset(ret, 0, size);
>  
>   /* Set up tces to cover the allocated range */
> - nio_pages = size >> tbl->it_page_shift;
> - io_order = get_iommu_order(size, tbl);
> + size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> + nio_pages = size_io >> tbl->it_page_shift;
> + io_order = get_iommu_order(size_io, tbl);
>   mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
> mask >> tbl->it_page_shift, io_order, 0);
>   if (mapping == DMA_MAPPING_ERROR) {
> @@ -900,11 +901,11 @@ void iommu_free_coherent(struct iommu_table *tbl, 
> size_t size,
>void *vaddr, dma_addr_t dma_handle)
>  {
>   if (tbl) {
> - unsigned int nio_pages;
> + size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
> + unsigned int nio_pages = size_io >> tbl->it_page_shift;
>  
> - size = PAGE_ALIGN(size);
> - nio_pages = size >> tbl->it_page_shift;
>   iommu_free(tbl, dma_handle, nio_pages);
> +

Unrelated new line.


>   size = PAGE_ALIGN(size);
>   free_pages((unsigned long)vaddr, get_order(size));
>   }
> 

-- 
Alexey


[PATCH v1 02/10] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE on iommu_*_coherent()

2020-08-17 Thread Leonardo Bras
Both iommu_alloc_coherent() and iommu_free_coherent() assume that once
size is aligned to PAGE_SIZE it will be aligned to IOMMU_PAGE_SIZE.

Update those functions to guarantee alignment with requested size
using IOMMU_PAGE_ALIGN() before doing iommu_alloc() / iommu_free().

Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
with IOMMU_PAGE_ALIGN(n, tbl), which seems easier to read.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/kernel/iommu.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..d7086087830f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device *dev,
}
 
if (dev)
-   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
- 1 << tbl->it_page_shift);
+   boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, 
tbl);
else
-   boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
+   boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
 
n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
@@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+   size_t size_io = size;
 
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
memset(ret, 0, size);
 
/* Set up tces to cover the allocated range */
-   nio_pages = size >> tbl->it_page_shift;
-   io_order = get_iommu_order(size, tbl);
+   size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
+   nio_pages = size_io >> tbl->it_page_shift;
+   io_order = get_iommu_order(size_io, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
  mask >> tbl->it_page_shift, io_order, 0);
if (mapping == DMA_MAPPING_ERROR) {
@@ -900,11 +901,11 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
 void *vaddr, dma_addr_t dma_handle)
 {
if (tbl) {
-   unsigned int nio_pages;
+   size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
+   unsigned int nio_pages = size_io >> tbl->it_page_shift;
 
-   size = PAGE_ALIGN(size);
-   nio_pages = size >> tbl->it_page_shift;
iommu_free(tbl, dma_handle, nio_pages);
+
size = PAGE_ALIGN(size);
free_pages((unsigned long)vaddr, get_order(size));
}
-- 
2.25.4