Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Leonardo Bras
On Tue, 2021-04-13 at 18:24 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 17:58, Leonardo Bras wrote:
> > On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:
> > > 
> > > On 13/04/2021 17:33, Leonardo Bras wrote:
> > > > On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> > > > > 
> > > > > On 13/04/2021 15:49, Leonardo Bras wrote:
> > > > > > Thanks for the feedback!
> > > > > > 
> > > > > > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > > > > > -static bool find_existing_ddw(struct device_node *pdn, u64 
> > > > > > > > *dma_addr)
> > > > > > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > > > > > 
> > > > > > > 
> > > > > > > Please, forward declaration or a separate patch; this creates
> > > > > > > unnecessary noise to the actual change.
> > > > > > > 
> > > > > > 
> > > > > > Sure, done!
> > > > > > 
> > > > > > > 
> > > > > > > > +   _iommu_table_setparms(tbl, 
> > > > > > > > pci->phb->bus->number, create.liobn, win_addr,
> > > > > > > > + 1UL << len, page_shift, 
> > > > > > > > 0, _table_lpar_multi_ops);
> > > > > > > > +   iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > > > > > 
> > > > > > > 
> > > > > > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > > > > > 
> > > > > > 
> > > > > > Oh, ok.
> > > > > > I previously though it was ok to use 0,0 here as any other usage in
> > > > > > this file was also 0,0.
> > > > > > 
> > > > > > What should I use to get the correct parameters? Use the previous 
> > > > > > tbl
> > > > > > it_reserved_start and tbl->it_reserved_end is enough?
> > > > > 
> > > > > depends on whether you carry reserved start/end even if they are 
> > > > > outside
> > > > > of the dma window.
> > > > > 
> > > > 
> > > > Oh, that makes sense.
> > > > On a previous patch (5/14 IIRC), I changed the behavior to only store
> > > > the valid range on tbl, but now I understand why it's important to
> > > > store the raw value.
> > > > 
> > > > Ok, I will change it back so the reserved range stays in tbl even if it
> > > > does not intersect with the DMA window. This way I can reuse the values
> > > > in case of indirect mapping with DDW.
> > > > 
> > > > Is that ok? Are the reserved values are supposed to stay the same after
> > > > changing from Default DMA window to DDW?
> > > 
> > > I added them to know what bits in it_map to ignore when checking if
> > > there is any active user of the table. If you have non zero reserved
> > > start/end but they do not affect it_map, then it is rather weird way to
> > > carry reserved start/end from DDW to no-DDW.
> > > 
> > 
> > Ok, agreed.
> > 
> > >   May be do not set these at
> > > all for DDW with window start at 1<<59 and when going back to no-DDW (or
> > > if DDW starts at 0) - just set them from MMIO32, just as they are
> > > initialized in the first place.
> > > 
> > 
> > If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
> > correct?
> 
> No, under QEMU it is 0x8000.-0x1..:
> 
> /proc/device-tree/pci@8002000/ranges
> 
> 7 cells for each resource, the second one is MMIO32 (the first is IO 
> ports, the last is 64bit MMIO).
> > 
> > So, if DDW starts at any value in this range (most probably at zero),
> > we should remove the rest, is that correct?
> > 
> > Could it always use iommu_init_table(..., 0, 32MB) here, so it always
> > reserve any part of the DMA window that's in this range? Ot there may
> > be other reserved values range?
> > 
> > > and when going back to no-DDW
> > 
> > After iommu_init_table() there should be no failure, so it looks like
> > there is no 'going back to no-DDW'. Am I missing something?
> 
> Well, a random driver could request 32bit DMA and if the new window is 
> 1:1, then it would break but this does not seem to happen and we do not 
> support it anyway so no loss here.
> 

So you would recommend reading "ranges" with of_get_property() and
using the second entry (cells 7 - 13) in this point, get base & size to
make sure it does not map anything here? (should have no effect if the
value does not intersect with the DMA window)

Thank you for reviewing!
Leonardo Bras



Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Alexey Kardashevskiy




On 13/04/2021 17:58, Leonardo Bras wrote:

On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:


On 13/04/2021 17:33, Leonardo Bras wrote:

On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:


On 13/04/2021 15:49, Leonardo Bras wrote:

Thanks for the feedback!

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:

-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
+static phys_addr_t ddw_memory_hotplug_max(void)



Please, forward declaration or a separate patch; this creates
unnecessary noise to the actual change.



Sure, done!




+   _iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, 
win_addr,
+ 1UL << len, page_shift, 0, 
_table_lpar_multi_ops);
+   iommu_init_table(tbl, pci->phb->node, 0, 0);



It is 0,0 only if win_addr>0 which is not the QEMU case.



Oh, ok.
I previously though it was ok to use 0,0 here as any other usage in
this file was also 0,0.

What should I use to get the correct parameters? Use the previous tbl
it_reserved_start and tbl->it_reserved_end is enough?


depends on whether you carry reserved start/end even if they are outside
of the dma window.



Oh, that makes sense.
On a previous patch (5/14 IIRC), I changed the behavior to only store
the valid range on tbl, but now I understand why it's important to
store the raw value.

Ok, I will change it back so the reserved range stays in tbl even if it
does not intersect with the DMA window. This way I can reuse the values
in case of indirect mapping with DDW.

Is that ok? Are the reserved values are supposed to stay the same after
changing from Default DMA window to DDW?


I added them to know what bits in it_map to ignore when checking if
there is any active user of the table. If you have non zero reserved
start/end but they do not affect it_map, then it is rather weird way to
carry reserved start/end from DDW to no-DDW.



Ok, agreed.


  May be do not set these at
all for DDW with window start at 1<<59 and when going back to no-DDW (or
if DDW starts at 0) - just set them from MMIO32, just as they are
initialized in the first place.



If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
correct?


No, under QEMU it is 0x8000.-0x1..:

/proc/device-tree/pci@8002000/ranges

7 cells for each resource, the second one is MMIO32 (the first is IO 
ports, the last is 64bit MMIO).




So, if DDW starts at any value in this range (most probably at zero),
we should remove the rest, is that correct?

Could it always use iommu_init_table(..., 0, 32MB) here, so it always
reserve any part of the DMA window that's in this range? Ot there may
be other reserved values range?


and when going back to no-DDW


After iommu_init_table() there should be no failure, so it looks like
there is no 'going back to no-DDW'. Am I missing something?


Well, a random driver could request 32bit DMA and if the new window is 
1:1, then it would break but this does not seem to happen and we do not 
support it anyway so no loss here.



--
Alexey


Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Leonardo Bras
On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 17:33, Leonardo Bras wrote:
> > On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> > > 
> > > On 13/04/2021 15:49, Leonardo Bras wrote:
> > > > Thanks for the feedback!
> > > > 
> > > > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > > > -static bool find_existing_ddw(struct device_node *pdn, u64 
> > > > > > *dma_addr)
> > > > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > > > 
> > > > > 
> > > > > Please, forward declaration or a separate patch; this creates
> > > > > unnecessary noise to the actual change.
> > > > > 
> > > > 
> > > > Sure, done!
> > > > 
> > > > > 
> > > > > > +   _iommu_table_setparms(tbl, pci->phb->bus->number, 
> > > > > > create.liobn, win_addr,
> > > > > > + 1UL << len, page_shift, 0, 
> > > > > > _table_lpar_multi_ops);
> > > > > > +   iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > > > 
> > > > > 
> > > > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > > > 
> > > > 
> > > > Oh, ok.
> > > > I previously though it was ok to use 0,0 here as any other usage in
> > > > this file was also 0,0.
> > > > 
> > > > What should I use to get the correct parameters? Use the previous tbl
> > > > it_reserved_start and tbl->it_reserved_end is enough?
> > > 
> > > depends on whether you carry reserved start/end even if they are outside
> > > of the dma window.
> > > 
> > 
> > Oh, that makes sense.
> > On a previous patch (5/14 IIRC), I changed the behavior to only store
> > the valid range on tbl, but now I understand why it's important to
> > store the raw value.
> > 
> > Ok, I will change it back so the reserved range stays in tbl even if it
> > does not intersect with the DMA window. This way I can reuse the values
> > in case of indirect mapping with DDW.
> > 
> > Is that ok? Are the reserved values are supposed to stay the same after
> > changing from Default DMA window to DDW?
> 
> I added them to know what bits in it_map to ignore when checking if 
> there is any active user of the table. If you have non zero reserved 
> start/end but they do not affect it_map, then it is rather weird way to 
> carry reserved start/end from DDW to no-DDW.
> 

Ok, agreed.

>  May be do not set these at 
> all for DDW with window start at 1<<59 and when going back to no-DDW (or 
> if DDW starts at 0) - just set them from MMIO32, just as they are 
> initialized in the first place.
> 

If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
correct?

So, if DDW starts at any value in this range (most probably at zero),
we should remove the rest, is that correct?

Could it always use iommu_init_table(..., 0, 32MB) here, so it always
reserve any part of the DMA window that's in this range? Ot there may
be other reserved values range?

> and when going back to no-DDW 

After iommu_init_table() there should be no failure, so it looks like
there is no 'going back to no-DDW'. Am I missing something?

Thanks for helping!

Best regards,
Leonardo Bras



Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Alexey Kardashevskiy




On 13/04/2021 17:33, Leonardo Bras wrote:

On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:


On 13/04/2021 15:49, Leonardo Bras wrote:

Thanks for the feedback!

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:

-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
+static phys_addr_t ddw_memory_hotplug_max(void)



Please, forward declaration or a separate patch; this creates
unnecessary noise to the actual change.



Sure, done!




+   _iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, 
win_addr,
+ 1UL << len, page_shift, 0, 
_table_lpar_multi_ops);
+   iommu_init_table(tbl, pci->phb->node, 0, 0);



It is 0,0 only if win_addr>0 which is not the QEMU case.



Oh, ok.
I previously though it was ok to use 0,0 here as any other usage in
this file was also 0,0.

What should I use to get the correct parameters? Use the previous tbl
it_reserved_start and tbl->it_reserved_end is enough?


depends on whether you carry reserved start/end even if they are outside
of the dma window.



Oh, that makes sense.
On a previous patch (5/14 IIRC), I changed the behavior to only store
the valid range on tbl, but now I understand why it's important to
store the raw value.

Ok, I will change it back so the reserved range stays in tbl even if it
does not intersect with the DMA window. This way I can reuse the values
in case of indirect mapping with DDW.

Is that ok? Are the reserved values are supposed to stay the same after
changing from Default DMA window to DDW?


I added them to know what bits in it_map to ignore when checking if 
there is any active user of the table. If you have non zero reserved 
start/end but they do not affect it_map, then it is rather weird way to 
carry reserved start/end from DDW to no-DDW. May be do not set these at 
all for DDW with window start at 1<<59 and when going back to no-DDW (or 
if DDW starts at 0) - just set them from MMIO32, just as they are 
initialized in the first place.




--
Alexey


Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Leonardo Bras
On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 15:49, Leonardo Bras wrote:
> > Thanks for the feedback!
> > 
> > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > 
> > > 
> > > Please, forward declaration or a separate patch; this creates
> > > unnecessary noise to the actual change.
> > > 
> > 
> > Sure, done!
> > 
> > > 
> > > > +   _iommu_table_setparms(tbl, pci->phb->bus->number, 
> > > > create.liobn, win_addr,
> > > > + 1UL << len, page_shift, 0, 
> > > > _table_lpar_multi_ops);
> > > > +   iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > 
> > > 
> > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > 
> > 
> > Oh, ok.
> > I previously though it was ok to use 0,0 here as any other usage in
> > this file was also 0,0.
> > 
> > What should I use to get the correct parameters? Use the previous tbl
> > it_reserved_start and tbl->it_reserved_end is enough?
> 
> depends on whether you carry reserved start/end even if they are outside 
> of the dma window.
> 

Oh, that makes sense.
On a previous patch (5/14 IIRC), I changed the behavior to only store
the valid range on tbl, but now I understand why it's important to
store the raw value.

Ok, I will change it back so the reserved range stays in tbl even if it
does not intersect with the DMA window. This way I can reuse the values
in case of indirect mapping with DDW.

Is that ok? Are the reserved values are supposed to stay the same after
changing from Default DMA window to DDW?

Best regards,
Leonardo Bras



Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-13 Thread Alexey Kardashevskiy




On 13/04/2021 15:49, Leonardo Bras wrote:

Thanks for the feedback!

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:

-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
+static phys_addr_t ddw_memory_hotplug_max(void)



Please, forward declaration or a separate patch; this creates
unnecessary noise to the actual change.



Sure, done!




+   _iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, 
win_addr,
+ 1UL << len, page_shift, 0, 
_table_lpar_multi_ops);
+   iommu_init_table(tbl, pci->phb->node, 0, 0);



It is 0,0 only if win_addr>0 which is not the QEMU case.



Oh, ok.
I previously though it was ok to use 0,0 here as any other usage in
this file was also 0,0.

What should I use to get the correct parameters? Use the previous tbl
it_reserved_start and tbl->it_reserved_end is enough?


depends on whether you carry reserved start/end even if they are outside 
of the dma window.



--
Alexey


Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2021-04-12 Thread Leonardo Bras
Thanks for the feedback!

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > +static phys_addr_t ddw_memory_hotplug_max(void)
> 
> 
> Please, forward declaration or a separate patch; this creates 
> unnecessary noise to the actual change.
> 

Sure, done!

> 
> > +   _iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, 
> > win_addr,
> > + 1UL << len, page_shift, 0, 
> > _table_lpar_multi_ops);
> > +   iommu_init_table(tbl, pci->phb->node, 0, 0);
> 
> 
> It is 0,0 only if win_addr>0 which is not the QEMU case.
> 

Oh, ok.
I previously though it was ok to use 0,0 here as any other usage in
this file was also 0,0. 

What should I use to get the correct parameters? Use the previous tbl
it_reserved_start and tbl->it_reserved_end is enough?

Best regards,
Leonardo Bras
> 



Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping

2020-09-28 Thread Alexey Kardashevskiy




On 12/09/2020 03:07, Leonardo Bras wrote:

Cc: linuxppc-dev@lists.ozlabs.org, linux-ker...@vger.kernel.org,

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

The default DMA window uses 4k pages instead of 64k pages, and since
the amount of pages (TCEs) may stay the same (on pHyp case), making
use of DDW instead of the default DMA window for indirect mapping will
expand in up to 16x the amount of memory that can be mapped on DMA.

Indirect mapping will only be used if direct mapping is not a
possibility.

For indirect mapping, it's necessary to re-create the iommu_table with
the new DMA window parameters, so iommu_alloc() can use it.

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

Even though there won't be both direct and indirect mappings at the
same time, we can't reuse the DIRECT64_PROPNAME property name, or else
an older kexec()ed kernel can assume direct mapping, and skip
iommu_alloc(), causing undesirable behavior.
So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
was created to represent a DDW that does not allow direct mapping.

Note: ddw_memory_hotplug_max() was moved up so it can be used in
find_existing_ddw().

Signed-off-by: Leonardo Bras 
---
  arch/powerpc/platforms/pseries/iommu.c | 160 -
  1 file changed, 103 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9b7c03652e72..c4de23080d1b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
  /* protects initializing window twice for same device */
  static DEFINE_MUTEX(direct_window_init_mutex);
  #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
  
  static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,

unsigned long num_pfn, const void *arg)
@@ -846,10 +847,48 @@ static int remove_ddw(struct device_node *np, bool 
remove_prop, const char *win_
return 0;
  }
  
-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)

+static phys_addr_t ddw_memory_hotplug_max(void)



Please, forward declaration or a separate patch; this creates 
unnecessary noise to the actual change.




+{
+   phys_addr_t max_addr = memory_hotplug_max();
+   struct device_node *memory;
+
+   /*
+* The "ibm,pmemory" can appear anywhere in the address space.
+* Assuming it is still backed by page structs, set the upper limit
+* for the huge DMA window as MAX_PHYSMEM_BITS.
+*/
+   if (of_find_node_by_type(NULL, "ibm,pmemory"))
+   return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
+   (phys_addr_t)-1 : (1ULL << MAX_PHYSMEM_BITS);
+
+   for_each_node_by_type(memory, "memory") {
+   unsigned long start, size;
+   int n_mem_addr_cells, n_mem_size_cells, len;
+   const __be32 *memcell_buf;
+
+   memcell_buf = of_get_property(memory, "reg", );
+   if (!memcell_buf || len <= 0)
+   continue;
+
+   n_mem_addr_cells = of_n_addr_cells(memory);
+   n_mem_size_cells = of_n_size_cells(memory);
+
+   start = of_read_number(memcell_buf, n_mem_addr_cells);
+   memcell_buf += n_mem_addr_cells;
+   size = of_read_number(memcell_buf, n_mem_size_cells);
+   memcell_buf += n_mem_size_cells;
+
+   max_addr = max_t(phys_addr_t, max_addr, start + size);
+   }
+
+   return max_addr;
+}
+
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool 
*direct_mapping)
  {
struct direct_window *window;
const struct dynamic_dma_window_prop *direct64;
+   unsigned long window_size;
bool found = false;
  
  	spin_lock(_window_list_lock);

@@ -858,6 +897,10 @@ static bool find_existing_ddw(struct device_node *pdn, u64 
*dma_addr)
if (window->device == pdn) {
direct64 = window->prop;
*dma_addr = be64_to_cpu(direct64->dma_base);
+
+   window_size = (1UL << 
be32_to_cpu(direct64->window_shift));
+   *direct_mapping = (window_size >= 
ddw_memory_hotplug_max());
+