Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-09-03 Thread Alexey Kardashevskiy




On 02/09/2020 16:11, Leonardo Bras wrote:

On Mon, 2020-08-31 at 14:35 +1000, Alexey Kardashevskiy wrote:


On 29/08/2020 04:36, Leonardo Bras wrote:

On Mon, 2020-08-24 at 15:17 +1000, Alexey Kardashevskiy wrote:

On 18/08/2020 09:40, Leonardo Bras wrote:

As of today, if the biggest DDW that can be created can't map the whole
partition, it's creation is skipped and the default DMA window
"ibm,dma-window" is used instead.

DDW is 16x bigger than the default DMA window,


16x only under very specific circumstances which are
1. phyp
2. sriov
3. device class in hmc (or what that priority number is in the lpar config).


Yeah, missing details.


having the same amount of
pages, but increasing the page size to 64k.
Besides larger DMA window,


"Besides being larger"?


You are right there.


it performs better for allocations over 4k,


Better how?


I was thinking for allocations larger than (512 * 4k), since >2
hypercalls are needed here, and for 64k pages would still be just 1
hypercall up to (512 * 64k).
But yeah, not the usual case anyway.


Yup.



so it would be nice to use it instead.


I'd rather say something like:
===
So far we assumed we can map the guest RAM 1:1 to the bus which worked
with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.
===


I mixed this in my commit message, it looks like this:

===
powerpc/pseries/iommu: Make use of DDW for indirect mapping

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW
creation is skipped and the default DMA window "ibm,dma-window" is used
instead.

The default DMA window uses 4k pages instead of 64k pages, and since
the amount of pages is the same,


Is the amount really the same? I thought you can prioritize some VFs
over others (== allocate different number of TCEs). Does it really
matter if it is the same?


On a conversation with Travis Pizel, he explained how it's supposed to
work, and I understood this:

When a VF is created, it will be assigned a capacity, like 4%, 20%, and
so on. The number of 'TCE entries' that are available to that partition
are proportional to that capacity.

If we use the default DMA window, the IOMMU pagesize/entry will be 4k,
and if we use DDW, we will get 64k pagesize. As the number of entries
will be the same (for the same capacity), the total space that can be
addressed by the IOMMU will be 16 times bigger. This sometimes enable
direct mapping, but sometimes it's still not enough.



Good to know. This is still an implementation detail, QEMU does not 
allocate TCEs like this.





On Travis words :
"A low capacity VF, with less resources available, will certainly have
less DMA window capability than a high capacity VF. But, an 8GB DMA
window (with 64k pages) is still 16x larger than an 512MB window (with
4K pages).
A high capacity VF - for example, one that Leonardo has in his scenario
- will go from 8GB (using 4K pages) to 128GB (using 64K pages) - again,
16x larger - but it's obviously still possible to create a partition
that exceeds 128GB of memory in size."



Right except the default dma window is not 8GB, it is <=2GB.








making use of DDW instead of the
default DMA window for indirect mapping will expand in 16x the amount
of memory that can be mapped on DMA.


Stop saying "16x", it is not guaranteed by anything :)



The DDW created will be used for direct mapping by default. [...]
===

What do you think?


The DDW created will be used for direct mapping by default.
If it's not available, indirect mapping will be used instead.

For indirect mapping, it's necessary to update the iommu_table so
iommu_alloc() can use the DDW created. For this,
iommu_table_update_window() is called when everything else succeeds
at enable_ddw().

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

As there will never have both direct and indirect mappings at the same
time, the same property name can be used for the created DDW.

So renaming
define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
to
define DMA64_PROPNAME "linux,dma64-ddr-window-info"
looks the right thing to do.


I know I suggested this but this does not look so good anymore as I
suspect it breaks kexec (from older kernel to this one) so you either
need to check for both DT names or just keep the old one. Changing the
macro name is fine.



Yeah, having 'direct' in the name don't really makes sense if 

Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-09-02 Thread Leonardo Bras
On Mon, 2020-08-31 at 14:35 +1000, Alexey Kardashevskiy wrote:
> 
> On 29/08/2020 04:36, Leonardo Bras wrote:
> > On Mon, 2020-08-24 at 15:17 +1000, Alexey Kardashevskiy wrote:
> > > On 18/08/2020 09:40, Leonardo Bras wrote:
> > > > As of today, if the biggest DDW that can be created can't map the whole
> > > > partition, it's creation is skipped and the default DMA window
> > > > "ibm,dma-window" is used instead.
> > > > 
> > > > DDW is 16x bigger than the default DMA window,
> > > 
> > > 16x only under very specific circumstances which are
> > > 1. phyp
> > > 2. sriov
> > > 3. device class in hmc (or what that priority number is in the lpar 
> > > config).
> > 
> > Yeah, missing details.
> > 
> > > > having the same amount of
> > > > pages, but increasing the page size to 64k.
> > > > Besides larger DMA window,
> > > 
> > > "Besides being larger"?
> > 
> > You are right there.
> > 
> > > > it performs better for allocations over 4k,
> > > 
> > > Better how?
> > 
> > I was thinking for allocations larger than (512 * 4k), since >2
> > hypercalls are needed here, and for 64k pages would still be just 1
> > hypercall up to (512 * 64k). 
> > But yeah, not the usual case anyway.
> 
> Yup.
> 
> 
> > > > so it would be nice to use it instead.
> > > 
> > > I'd rather say something like:
> > > ===
> > > So far we assumed we can map the guest RAM 1:1 to the bus which worked
> > > with a small number of devices. SRIOV changes it as the user can
> > > configure hundreds VFs and since phyp preallocates TCEs and does not
> > > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> > > per a PE to limit waste of physical pages.
> > > ===
> > 
> > I mixed this in my commit message, it looks like this:
> > 
> > ===
> > powerpc/pseries/iommu: Make use of DDW for indirect mapping
> > 
> > So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> > works with a small number of devices. SRIOV changes it as the user can
> > configure hundreds VFs and since phyp preallocates TCEs and does not
> > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> > per a PE to limit waste of physical pages.
> > 
> > As of today, if the assumed direct mapping is not possible, DDW
> > creation is skipped and the default DMA window "ibm,dma-window" is used
> > instead.
> > 
> > The default DMA window uses 4k pages instead of 64k pages, and since
> > the amount of pages is the same,
> 
> Is the amount really the same? I thought you can prioritize some VFs
> over others (== allocate different number of TCEs). Does it really
> matter if it is the same?

On a conversation with Travis Pizel, he explained how it's supposed to
work, and I understood this:

When a VF is created, it will be assigned a capacity, like 4%, 20%, and
so on. The number of 'TCE entries' that are available to that partition
are proportional to that capacity. 

If we use the default DMA window, the IOMMU pagesize/entry will be 4k,
and if we use DDW, we will get 64k pagesize. As the number of entries
will be the same (for the same capacity), the total space that can be
addressed by the IOMMU will be 16 times bigger. This sometimes enable
direct mapping, but sometimes it's still not enough.

On Travis words :
"A low capacity VF, with less resources available, will certainly have
less DMA window capability than a high capacity VF. But, an 8GB DMA
window (with 64k pages) is still 16x larger than an 512MB window (with
4K pages).
A high capacity VF - for example, one that Leonardo has in his scenario
- will go from 8GB (using 4K pages) to 128GB (using 64K pages) - again,
16x larger - but it's obviously still possible to create a partition
that exceeds 128GB of memory in size."

> 
> 
> > making use of DDW instead of the
> > default DMA window for indirect mapping will expand in 16x the amount
> > of memory that can be mapped on DMA.
> 
> Stop saying "16x", it is not guaranteed by anything :)
> 
> 
> > The DDW created will be used for direct mapping by default. [...]
> > ===
> > 
> > What do you think?
> > 
> > > > The DDW created will be used for direct mapping by default.
> > > > If it's not available, indirect mapping will be used instead.
> > > > 
> > > > For indirect mapping, it's necessary to update the iommu_table so
> > > > iommu_alloc() can use the DDW created. For this,
> > > > iommu_table_update_window() is called when everything else succeeds
> > > > at enable_ddw().
> > > > 
> > > > Removing the default DMA window for using DDW with indirect mapping
> > > > is only allowed if there is no current IOMMU memory allocated in
> > > > the iommu_table. enable_ddw() is aborted otherwise.
> > > > 
> > > > As there will never have both direct and indirect mappings at the same
> > > > time, the same property name can be used for the created DDW.
> > > > 
> > > > So renaming
> > > > define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> > > > to
> > > > define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> > > > looks the right 

Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-08-30 Thread Alexey Kardashevskiy



On 29/08/2020 04:36, Leonardo Bras wrote:
> On Mon, 2020-08-24 at 15:17 +1000, Alexey Kardashevskiy wrote:
>>
>> On 18/08/2020 09:40, Leonardo Bras wrote:
>>> As of today, if the biggest DDW that can be created can't map the whole
>>> partition, it's creation is skipped and the default DMA window
>>> "ibm,dma-window" is used instead.
>>>
>>> DDW is 16x bigger than the default DMA window,
>>
>> 16x only under very specific circumstances which are
>> 1. phyp
>> 2. sriov
>> 3. device class in hmc (or what that priority number is in the lpar config).
> 
> Yeah, missing details.
> 
>>> having the same amount of
>>> pages, but increasing the page size to 64k.
>>> Besides larger DMA window,
>>
>> "Besides being larger"?
> 
> You are right there.
> 
>>
>>> it performs better for allocations over 4k,
>>
>> Better how?
> 
> I was thinking for allocations larger than (512 * 4k), since >2
> hypercalls are needed here, and for 64k pages would still be just 1
> hypercall up to (512 * 64k). 
> But yeah, not the usual case anyway.

Yup.


> 
>>
>>> so it would be nice to use it instead.
>>
>> I'd rather say something like:
>> ===
>> So far we assumed we can map the guest RAM 1:1 to the bus which worked
>> with a small number of devices. SRIOV changes it as the user can
>> configure hundreds VFs and since phyp preallocates TCEs and does not
>> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
>> per a PE to limit waste of physical pages.
>> ===
> 
> I mixed this in my commit message, it looks like this:
> 
> ===
> powerpc/pseries/iommu: Make use of DDW for indirect mapping
> 
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW
> creation is skipped and the default DMA window "ibm,dma-window" is used
> instead.
> 
> The default DMA window uses 4k pages instead of 64k pages, and since
> the amount of pages is the same,


Is the amount really the same? I thought you can prioritize some VFs
over others (== allocate different number of TCEs). Does it really
matter if it is the same?


> making use of DDW instead of the
> default DMA window for indirect mapping will expand in 16x the amount
> of memory that can be mapped on DMA.

Stop saying "16x", it is not guaranteed by anything :)


> 
> The DDW created will be used for direct mapping by default. [...]
> ===
> 
> What do you think?
> 
>>> The DDW created will be used for direct mapping by default.
>>> If it's not available, indirect mapping will be used instead.
>>>
>>> For indirect mapping, it's necessary to update the iommu_table so
>>> iommu_alloc() can use the DDW created. For this,
>>> iommu_table_update_window() is called when everything else succeeds
>>> at enable_ddw().
>>>
>>> Removing the default DMA window for using DDW with indirect mapping
>>> is only allowed if there is no current IOMMU memory allocated in
>>> the iommu_table. enable_ddw() is aborted otherwise.
>>>
>>> As there will never have both direct and indirect mappings at the same
>>> time, the same property name can be used for the created DDW.
>>>
>>> So renaming
>>> define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>>> to
>>> define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>>> looks the right thing to do.
>>
>> I know I suggested this but this does not look so good anymore as I
>> suspect it breaks kexec (from older kernel to this one) so you either
>> need to check for both DT names or just keep the old one. Changing the
>> macro name is fine.
>>
> 
> Yeah, having 'direct' in the name don't really makes sense if it's used
> for indirect mapping. I will just add this new define instead of
> replacing the old one, and check for both. 
> Is that ok?


No, having two of these does not seem right or useful. It is pseries
which does not use petitboot (relies on grub instead so until the target
kernel is started, there will be no ddw) so realistically we need this
property for kexec/kdump which uses the same kernel but different
initramdisk so for that purpose we need the same property name.

But I can see myself annoyed when I try petitboot in the hacked pseries
qemu and things may crash :) On this basis I'd suggest keeping the name
and adding a comment next to it that it is not always "direct" anymore.


> 
>>
>>> To make sure the property differentiates both cases, a new u32 for flags
>>> was added at the end of the property, where BIT(0) set means direct
>>> mapping.
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 108 +++--
>>>  1 file changed, 84 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> 

Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-08-28 Thread Leonardo Bras
On Mon, 2020-08-24 at 15:17 +1000, Alexey Kardashevskiy wrote:
> 
> On 18/08/2020 09:40, Leonardo Bras wrote:
> > As of today, if the biggest DDW that can be created can't map the whole
> > partition, it's creation is skipped and the default DMA window
> > "ibm,dma-window" is used instead.
> > 
> > DDW is 16x bigger than the default DMA window,
> 
> 16x only under very specific circumstances which are
> 1. phyp
> 2. sriov
> 3. device class in hmc (or what that priority number is in the lpar config).

Yeah, missing details.

> > having the same amount of
> > pages, but increasing the page size to 64k.
> > Besides larger DMA window,
> 
> "Besides being larger"?

You are right there.

> 
> > it performs better for allocations over 4k,
> 
> Better how?

I was thinking for allocations larger than (512 * 4k), since >2
hypercalls are needed here, and for 64k pages would still be just 1
hypercall up to (512 * 64k). 
But yeah, not the usual case anyway.

> 
> > so it would be nice to use it instead.
> 
> I'd rather say something like:
> ===
> So far we assumed we can map the guest RAM 1:1 to the bus which worked
> with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> ===

I mixed this in my commit message, it looks like this:

===
powerpc/pseries/iommu: Make use of DDW for indirect mapping

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW
creation is skipped and the default DMA window "ibm,dma-window" is used
instead.

The default DMA window uses 4k pages instead of 64k pages, and since
the amount of pages is the same, making use of DDW instead of the
default DMA window for indirect mapping will expand in 16x the amount
of memory that can be mapped on DMA.

The DDW created will be used for direct mapping by default. [...]
===

What do you think?

> > The DDW created will be used for direct mapping by default.
> > If it's not available, indirect mapping will be used instead.
> > 
> > For indirect mapping, it's necessary to update the iommu_table so
> > iommu_alloc() can use the DDW created. For this,
> > iommu_table_update_window() is called when everything else succeeds
> > at enable_ddw().
> > 
> > Removing the default DMA window for using DDW with indirect mapping
> > is only allowed if there is no current IOMMU memory allocated in
> > the iommu_table. enable_ddw() is aborted otherwise.
> > 
> > As there will never have both direct and indirect mappings at the same
> > time, the same property name can be used for the created DDW.
> > 
> > So renaming
> > define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> > to
> > define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> > looks the right thing to do.
> 
> I know I suggested this but this does not look so good anymore as I
> suspect it breaks kexec (from older kernel to this one) so you either
> need to check for both DT names or just keep the old one. Changing the
> macro name is fine.
> 

Yeah, having 'direct' in the name don't really makes sense if it's used
for indirect mapping. I will just add this new define instead of
replacing the old one, and check for both. 
Is that ok?

> 
> > To make sure the property differentiates both cases, a new u32 for flags
> > was added at the end of the property, where BIT(0) set means direct
> > mapping.
> > 
> > Signed-off-by: Leonardo Bras 
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 108 +++--
> >  1 file changed, 84 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 3a1ef02ad9d5..9544e3c91ced 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -350,8 +350,11 @@ struct dynamic_dma_window_prop {
> > __be64  dma_base;   /* address hi,lo */
> > __be32  tce_shift;  /* ilog2(tce_page_size) */
> > __be32  window_shift;   /* ilog2(tce_window_size) */
> > +   __be32  flags;  /* DDW properties, see bellow */
> >  };
> >  
> > +#define DDW_FLAGS_DIRECT   0x01
> 
> This is set if ((1<= ddw_memory_hotplug_max()), you
> could simply check window_shift and drop the flags.
> 

Yeah, it's better this way, I will revert this.

> 
> > +
> >  struct direct_window {
> > struct device_node *device;
> > const struct dynamic_dma_window_prop *prop;
> > @@ -377,7 +380,7 @@ static LIST_HEAD(direct_window_list);
> >  static DEFINE_SPINLOCK(direct_window_list_lock);
> >  /* protects initializing 

Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> As of today, if the biggest DDW that can be created can't map the whole
> partition, it's creation is skipped and the default DMA window
> "ibm,dma-window" is used instead.
> 
> DDW is 16x bigger than the default DMA window,

16x only under very specific circumstances which are
1. phyp
2. sriov
3. device class in hmc (or what that priority number is in the lpar config).

> having the same amount of
> pages, but increasing the page size to 64k.
> Besides larger DMA window,

"Besides being larger"?

> it performs better for allocations over 4k,

Better how?

> so it would be nice to use it instead.


I'd rather say something like:
===
So far we assumed we can map the guest RAM 1:1 to the bus which worked
with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.
===


> 
> The DDW created will be used for direct mapping by default.
> If it's not available, indirect mapping will be used instead.
> 
> For indirect mapping, it's necessary to update the iommu_table so
> iommu_alloc() can use the DDW created. For this,
> iommu_table_update_window() is called when everything else succeeds
> at enable_ddw().
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> As there will never have both direct and indirect mappings at the same
> time, the same property name can be used for the created DDW.
> 
> So renaming
> define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> to
> define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> looks the right thing to do.

I know I suggested this but this does not look so good anymore as I
suspect it breaks kexec (from older kernel to this one) so you either
need to check for both DT names or just keep the old one. Changing the
macro name is fine.


> 
> To make sure the property differentiates both cases, a new u32 for flags
> was added at the end of the property, where BIT(0) set means direct
> mapping.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 108 +++--
>  1 file changed, 84 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 3a1ef02ad9d5..9544e3c91ced 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -350,8 +350,11 @@ struct dynamic_dma_window_prop {
>   __be64  dma_base;   /* address hi,lo */
>   __be32  tce_shift;  /* ilog2(tce_page_size) */
>   __be32  window_shift;   /* ilog2(tce_window_size) */
> + __be32  flags;  /* DDW properties, see bellow */
>  };
>  
> +#define DDW_FLAGS_DIRECT 0x01

This is set if ((1<= ddw_memory_hotplug_max()), you
could simply check window_shift and drop the flags.


> +
>  struct direct_window {
>   struct device_node *device;
>   const struct dynamic_dma_window_prop *prop;
> @@ -377,7 +380,7 @@ static LIST_HEAD(direct_window_list);
>  static DEFINE_SPINLOCK(direct_window_list_lock);
>  /* protects initializing window twice for same device */
>  static DEFINE_MUTEX(direct_window_init_mutex);
> -#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>  
>  static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   unsigned long num_pfn, const void *arg)
> @@ -836,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   if (ret)
>   return;
>  
> - win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + win = of_find_property(np, DMA64_PROPNAME, NULL);
>   if (!win)
>   return;
>  
> @@ -852,7 +855,7 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   np, ret);
>  }
>  
> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool 
> *direct_mapping)
>  {
>   struct direct_window *window;
>   const struct dynamic_dma_window_prop *direct64;
> @@ -864,6 +867,7 @@ static bool find_existing_ddw(struct device_node *pdn, 
> u64 *dma_addr)
>   if (window->device == pdn) {
>   direct64 = window->prop;
>   *dma_addr = be64_to_cpu(direct64->dma_base);
> + *direct_mapping = be32_to_cpu(direct64->flags) & 
> DDW_FLAGS_DIRECT;
>   found = true;
>   break;
>   }
> @@ -901,8 +905,8 @@ static int find_existing_ddw_windows(void)
>   if (!firmware_has_feature(FW_FEATURE_LPAR))
>   return 0;
>  
> 

[PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-08-17 Thread Leonardo Bras
As of today, if the biggest DDW that can be created can't map the whole
partition, it's creation is skipped and the default DMA window
"ibm,dma-window" is used instead.

DDW is 16x bigger than the default DMA window, having the same amount of
pages, but increasing the page size to 64k.
Besides larger DMA window, it performs better for allocations over 4k,
so it would be nice to use it instead.

The DDW created will be used for direct mapping by default.
If it's not available, indirect mapping will be used instead.

For indirect mapping, it's necessary to update the iommu_table so
iommu_alloc() can use the DDW created. For this,
iommu_table_update_window() is called when everything else succeeds
at enable_ddw().

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

As there will never have both direct and indirect mappings at the same
time, the same property name can be used for the created DDW.

So renaming
define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
to
define DMA64_PROPNAME "linux,dma64-ddr-window-info"
looks the right thing to do.

To make sure the property differentiates both cases, a new u32 for flags
was added at the end of the property, where BIT(0) set means direct
mapping.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/platforms/pseries/iommu.c | 108 +++--
 1 file changed, 84 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 3a1ef02ad9d5..9544e3c91ced 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -350,8 +350,11 @@ struct dynamic_dma_window_prop {
__be64  dma_base;   /* address hi,lo */
__be32  tce_shift;  /* ilog2(tce_page_size) */
__be32  window_shift;   /* ilog2(tce_window_size) */
+   __be32  flags;  /* DDW properties, see bellow */
 };
 
+#define DDW_FLAGS_DIRECT   0x01
+
 struct direct_window {
struct device_node *device;
const struct dynamic_dma_window_prop *prop;
@@ -377,7 +380,7 @@ static LIST_HEAD(direct_window_list);
 static DEFINE_SPINLOCK(direct_window_list_lock);
 /* protects initializing window twice for same device */
 static DEFINE_MUTEX(direct_window_init_mutex);
-#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
 
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
unsigned long num_pfn, const void *arg)
@@ -836,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
if (ret)
return;
 
-   win = of_find_property(np, DIRECT64_PROPNAME, NULL);
+   win = of_find_property(np, DMA64_PROPNAME, NULL);
if (!win)
return;
 
@@ -852,7 +855,7 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
np, ret);
 }
 
-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool 
*direct_mapping)
 {
struct direct_window *window;
const struct dynamic_dma_window_prop *direct64;
@@ -864,6 +867,7 @@ static bool find_existing_ddw(struct device_node *pdn, u64 
*dma_addr)
if (window->device == pdn) {
direct64 = window->prop;
*dma_addr = be64_to_cpu(direct64->dma_base);
+   *direct_mapping = be32_to_cpu(direct64->flags) & 
DDW_FLAGS_DIRECT;
found = true;
break;
}
@@ -901,8 +905,8 @@ static int find_existing_ddw_windows(void)
if (!firmware_has_feature(FW_FEATURE_LPAR))
return 0;
 
-   for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
-   direct64 = of_get_property(pdn, DIRECT64_PROPNAME, );
+   for_each_node_with_property(pdn, DMA64_PROPNAME) {
+   direct64 = of_get_property(pdn, DMA64_PROPNAME, );
if (!direct64)
continue;
 
@@ -1124,7 +1128,8 @@ static void reset_dma_window(struct pci_dev *dev, struct 
device_node *par_dn)
 }
 
 static int ddw_property_create(struct property **ddw_win, const char *propname,
-  u32 liobn, u64 dma_addr, u32 page_shift, u32 
window_shift)
+  u32 liobn, u64 dma_addr, u32 page_shift,
+  u32 window_shift, bool direct_mapping)
 {
struct dynamic_dma_window_prop *ddwprop;
struct property *win64;
@@ -1144,6 +1149,36 @@ static int ddw_property_create(struct property 
**ddw_win, const char *propname,
ddwprop->dma_base = cpu_to_be64(dma_addr);
ddwprop->tce_shift = cpu_to_be32(page_shift);
ddwprop->window_shift =