Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Jason Gunthorpe
On Thu, Apr 29, 2021 at 03:26:55PM +0200, Auger Eric wrote:
> From the pseudo code,
> 
>   gpa_ioasid_id = ioctl(ioasid_fd, CREATE_IOASID, ..)
>   ioctl(ioasid_fd, SET_IOASID_PAGE_TABLES, ..)
> 
> I fail to understand whether the SET_IOASID_PAGE_TABLES would apply to
> the whole IOASIDs within /dev/ioasid or to a specific one.

Sorry, nearly every IOCTL would be scoped to a specific IOASID as one
of the arguments.

> Also in subsequent emails when you talk about IOASID, is it the
> ioasid_id, just to double check the terminology.

I am refering to IOASID as 'handle of the page table object inside the
/dev/ioasid fd'. If that is equal to some HW value or not I think
remains as decision point.

Basically the fd has an xarray of 'struct [something] *' and the
IOASID is index to that FD's private xarray. This is necessary to
create proper security as even if we have global PASID numbers or
something they still need to be isolated to only the FD that has
been authorized access.

> >   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
> >   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
> is the nested_ioasid the allocated PASID id or is it a complete
> different object id.

It is the IOASID handle above.

> >
> >// IOMMU will match on the device RID, no PASID:
> >   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
> > 
> >// IOMMU will match on the device RID and PASID:
> >   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
> here I see you pass a different pasid, so I guess they are different, in
> which case you would need to have an allocator function for this pasid,
> right?

Yes, the underlying HW ID (PASID or substream id or whatver) is
something slightly different

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: don't override user specified size in swiotlb_adjust_size

2021-04-29 Thread Konrad Rzeszutek Wilk
On Thu, Apr 29, 2021 at 08:45:51AM -0500, Tom Lendacky wrote:
> On 4/29/21 1:28 AM, Christoph Hellwig wrote:
> > If the user already specified a swiotlb size on the command line,
> > swiotlb_adjust_size shoul not overwrite it.
> > 
> > Fixes: 2cbc2776efe4 ("swiotlb: remove swiotlb_nr_tbl")
> > Reported-by: Tom Lendacky 
> 
> Thanks, Christoph!
> 
> Tested-by: Tom Lendacky 

Awesome!  I put on the stable/for-linus-5.13 and will send a GIT PULL to
Linus later on this week.

Thank you!
> 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  kernel/dma/swiotlb.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 8635a57f88e9..8ca7d505d61c 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -118,6 +118,8 @@ void __init swiotlb_adjust_size(unsigned long size)
> >  * architectures such as those supporting memory encryption to
> >  * adjust/expand SWIOTLB size for their use.
> >  */
> > +   if (default_nslabs != IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT)
> > +   return;
> > size = ALIGN(size, IO_TLB_SIZE);
> > default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> > pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: don't override user specified size in swiotlb_adjust_size

2021-04-29 Thread Tom Lendacky
On 4/29/21 1:28 AM, Christoph Hellwig wrote:
> If the user already specified a swiotlb size on the command line,
> swiotlb_adjust_size shoul not overwrite it.
> 
> Fixes: 2cbc2776efe4 ("swiotlb: remove swiotlb_nr_tbl")
> Reported-by: Tom Lendacky 

Thanks, Christoph!

Tested-by: Tom Lendacky 

> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/swiotlb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 8635a57f88e9..8ca7d505d61c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -118,6 +118,8 @@ void __init swiotlb_adjust_size(unsigned long size)
>* architectures such as those supporting memory encryption to
>* adjust/expand SWIOTLB size for their use.
>*/
> + if (default_nslabs != IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT)
> + return;
>   size = ALIGN(size, IO_TLB_SIZE);
>   default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
>   pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/22/21 2:10 PM, Jason Gunthorpe wrote:
> On Thu, Apr 22, 2021 at 08:34:32AM +, Tian, Kevin wrote:
> 
>> The shim layer could be considered as a new iommu backend in VFIO,
>> which connects VFIO iommu ops to the internal helpers in
>> drivers/ioasid.
> 
> It may be the best we can do because of SPAPR, but the ideal outcome
> should be to remove the entire pluggable IOMMU stuff from vfio
> entirely and have it only use /dev/ioasid
> 
> We should never add another pluggable IOMMU type to vfio - everything
> should be done through drives/iommu now that it is much more capable.
> 
>> Another tricky thing is that a container may be linked to multiple iommu
>> domains in VFIO, as devices in the container may locate behind different
>> IOMMUs with inconsistent capability (commit 1ef3e2bc). 
> 
> Frankly this sounds over complicated. I would think /dev/ioasid should
> select the IOMMU when the first device is joined, and all future joins
> must be compatible with the original IOMMU - ie there is only one set
> of IOMMU capabilities in a /dev/ioasid.
> 
> This means qemue might have multiple /dev/ioasid's if the system has
> multiple incompatible IOMMUs (is this actually a thing?) The platform
> should design its IOMMU domains to minimize the number of
> /dev/ioasid's required.
> 
> Is there a reason we need to share IOASID'd between completely
> divergance IOMMU implementations? I don't expect the HW should be able
> to physically share page tables??
> 
> That decision point alone might be the thing that just says we can't
> ever have /dev/vfio/vfio == /dev/ioasid
> 
>> Just to confirm. Above flow is for current map/unmap flavor as what
>> VFIO/vDPA do today. Later when nested translation is supported,
>> there is no need to detach gpa_ioasid_fd. Instead, a new cmd will
>> be introduced to nest rid_ioasid_fd on top of gpa_ioasid_fd:
> 
> Sure.. The tricky bit will be to define both of the common nested
> operating modes.
>

>From the pseudo code,

  gpa_ioasid_id = ioctl(ioasid_fd, CREATE_IOASID, ..)
  ioctl(ioasid_fd, SET_IOASID_PAGE_TABLES, ..)

I fail to understand whether the SET_IOASID_PAGE_TABLES would apply to
the whole IOASIDs within /dev/ioasid or to a specific one.

Also in subsequent emails when you talk about IOASID, is it the
ioasid_id, just to double check the terminology.


>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
is the nested_ioasid the allocated PASID id or is it a complete
different object id.
> 
>// IOMMU will match on the device RID, no PASID:
>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
> 
>// IOMMU will match on the device RID and PASID:
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
here I see you pass a different pasid, so I guess they are different, in
which case you would need to have an allocator function for this pasid,
right?

Thanks

Eric
> 
> Notice that ATTACH (or bind, whatever) is always done on the
> vfio_device FD. ATTACH tells the IOMMU HW to link the PCI BDF to
> a specific page table defined by an IOASID.
> 
> I expect we have many flavours of IOASID tables, eg we have normal,
> and 'nested with table controlled by hypervisor'. ARM has 'nested with
> table controlled by guest' right? So like this?
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_DELGATED_IOASID,
>gpa_ioasid_id, )
>   // PASID now goes to 
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);

> 
> Where  is some internal to the guest handle of the viommu
> page table scoped within gpa_ioasid_id? Like maybe it is GPA of the
> base of the page table?
> 
> The guest can't select its own PASIDs without telling the hypervisor,
> right?
> 
>> I also feel hiding group from uAPI is a good thing and is interested in
>> the rationale behind for explicitly managing group in vfio (which is
>> essentially the same boundary as provided by iommu group), e.g. for 
>> better user experience when group security is broken? 
> 
> Indeed, I can see how things might have just evolved into this, but if
> it has a purpose it seems pretty hidden.
> we need it or not seems pretty hidden.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-29 Thread Dmitry Osipenko
29.04.2021 08:51, Krishna Reddy пишет:
> Hi Dmitry,
> 
>> Thank you for the answer. Could you please give more information about:
>> 1) Are you on software or hardware team, or both?
> 
> I am in the software team and has contributed to initial Tegra SMMU driver in 
> the downstream along with earlier team member Hiroshi Doyu.
> 
>> 2) Is SMMU a third party IP or developed in-house?
> 
> Tegra SMMU is developed in-house. 
> 
>> 3) Do you have a direct access to HDL sources? Are you 100% sure that
>> hardware does what you say?
> 
> It was discussed with Hardware team before and again today as well.
> Enabling ASID for display engine while it continues to access the buffer 
> memory is a safe operation.
> As per HW team, The only side-effect that can happen is additional latency to 
> transaction as SMMU caches get warmed up.
> 
>> 4) What happens when CPU writes to ASID register? Does SMMU state machine
>> latch ASID status (making it visible) only at a single "safe" point?
> 
> MC makes a decision on routing transaction through either SMMU page tables or 
> bypassing based on the ASID register value.  It
> checks the ASID register only once per transaction in the pipeline.

Thank you very much for the clarification.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/22/21 2:10 PM, Jason Gunthorpe wrote:
> On Thu, Apr 22, 2021 at 08:34:32AM +, Tian, Kevin wrote:
> 
>> The shim layer could be considered as a new iommu backend in VFIO,
>> which connects VFIO iommu ops to the internal helpers in
>> drivers/ioasid.
> 
> It may be the best we can do because of SPAPR, but the ideal outcome
> should be to remove the entire pluggable IOMMU stuff from vfio
> entirely and have it only use /dev/ioasid
> 
> We should never add another pluggable IOMMU type to vfio - everything
> should be done through drives/iommu now that it is much more capable.
> 
>> Another tricky thing is that a container may be linked to multiple iommu
>> domains in VFIO, as devices in the container may locate behind different
>> IOMMUs with inconsistent capability (commit 1ef3e2bc). 
> 
> Frankly this sounds over complicated. I would think /dev/ioasid should
> select the IOMMU when the first device is joined, and all future joins
> must be compatible with the original IOMMU - ie there is only one set
> of IOMMU capabilities in a /dev/ioasid.
> 
> This means qemue might have multiple /dev/ioasid's if the system has
> multiple incompatible IOMMUs (is this actually a thing?) The platform
> should design its IOMMU domains to minimize the number of
> /dev/ioasid's required.
> 
> Is there a reason we need to share IOASID'd between completely
> divergance IOMMU implementations? I don't expect the HW should be able
> to physically share page tables??
> 
> That decision point alone might be the thing that just says we can't
> ever have /dev/vfio/vfio == /dev/ioasid
> 
>> Just to confirm. Above flow is for current map/unmap flavor as what
>> VFIO/vDPA do today. Later when nested translation is supported,
>> there is no need to detach gpa_ioasid_fd. Instead, a new cmd will
>> be introduced to nest rid_ioasid_fd on top of gpa_ioasid_fd:
> 
> Sure.. The tricky bit will be to define both of the common nested
> operating modes.
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
> 
>// IOMMU will match on the device RID, no PASID:
>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
> 
>// IOMMU will match on the device RID and PASID:
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
> 
> Notice that ATTACH (or bind, whatever) is always done on the
> vfio_device FD. ATTACH tells the IOMMU HW to link the PCI BDF to
> a specific page table defined by an IOASID.
> 
> I expect we have many flavours of IOASID tables, eg we have normal,
> and 'nested with table controlled by hypervisor'. ARM has 'nested with
> table controlled by guest' right? So like this?
yes the PASID table is fully controlled by the guest Same for the stage
1 table.
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_DELGATED_IOASID,
>gpa_ioasid_id, )
>   // PASID now goes to 
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
> 
> Where  is some internal to the guest handle of the viommu
> page table scoped within gpa_ioasid_id? Like maybe it is GPA of the
> base of the page table?
Yes the GPA of the first level page table + some misc info like the max
number of IOASIDs.
> 
> The guest can't select its own PASIDs without telling the hypervisor,
> right?
on ARM there is no system wide IOASID allocator as for x86. So the guest
can select its own PASID without telling the hyp.

Thanks

Eric
> 
>> I also feel hiding group from uAPI is a good thing and is interested in
>> the rationale behind for explicitly managing group in vfio (which is
>> essentially the same boundary as provided by iommu group), e.g. for 
>> better user experience when group security is broken? 
> 
> Indeed, I can see how things might have just evolved into this, but if
> it has a purpose it seems pretty hidden.
> we need it or not seems pretty hidden.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/23/21 1:49 PM, Jason Gunthorpe wrote:
> On Fri, Apr 23, 2021 at 09:06:44AM +, Tian, Kevin wrote:
> 
>> Or could we still have just one /dev/ioasid but allow userspace to create
>> multiple gpa_ioasid_id's each associated to a different iommu domain? 
>> Then the compatibility check will be done at ATTACH_IOASID instead of 
>> JOIN_IOASID_FD.
> 
> To my mind what makes sense that that /dev/ioasid presents a single
> IOMMU behavior that is basically the same. This may ultimately not be
> what we call a domain today.
> 
> We may end up with a middle object which is a group of domains that
> all have the same capabilities, and we define capabilities in a way
> that most platforms have a single group of domains.
> 
> The key capability of a group of domains is they can all share the HW
> page table representation, so if an IOASID instantiates a page table
> it can be assigned to any device on any domain in the gruop of domains.
> 
> If you try to say that /dev/ioasid has many domains and they can't
> have their HW page tables shared then I think the implementation
> complexity will explode.
> 
>> This does impose one burden to userspace though, to understand the 
>> IOMMU compatibilities and figure out which incompatible features may
>> affect the page table management (while such knowledge is IOMMU
>> vendor specific) and then explicitly manage multiple /dev/ioasid's or 
>> multiple gpa_ioasid_id's.
> 
> Right, this seems very hard in the general case..
>  
>> Alternatively is it a good design by having the kernel return error at
>> attach/join time to indicate that incompatibility is detected then the 
>> userspace should open a new /dev/ioasid or creates a new gpa_ioasid_id
>> for the failing device upon such failure, w/o constructing its own 
>> compatibility knowledge?
> 
> Yes, this feels workable too
> 
>>> This means qemue might have multiple /dev/ioasid's if the system has
>>> multiple incompatible IOMMUs (is this actually a thing?) The platform
>>
>> One example is Intel platform with igd. Typically there is one IOMMU
>> dedicated for igd and the other IOMMU serving all the remaining devices.
>> The igd IOMMU may not support IOMMU_CACHE while the other one
>> does.
> 
> If we can do as above the two domains may be in the same group of
> domains and the IOMMU_CACHE is not exposed at the /dev/ioasid level.
> 
> For instance the API could specifiy IOMMU_CACHE during attach, not
> during IOASID creation.
> 
> Getting all the data model right in the API is going to be trickiest
> part of this.
> 
>> yes, e.g. in vSVA both devices (behind divergence IOMMUs) are bound
>> to a single guest process which has an unique PASID and 1st-level page
>> table. Earlier incompatibility example is only for 2nd-level.
> 
> Because when we get to here, things become inscrutable as an API if
> you are trying to say two different IOMMU presentations can actually
> be nested.
> 
>>> Sure.. The tricky bit will be to define both of the common nested
>>> operating modes.
>>>
>>>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>>>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
>>>
>>>// IOMMU will match on the device RID, no PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
>>>
>>>// IOMMU will match on the device RID and PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
>>
>> I'm a bit confused here why we have both pasid and ioasid notations together.
>> Why not use nested_ioasid as pasid directly (i.e. every pasid in nested mode
>> is created by CREATE_NESTED_IOASID)?
> 
> The IOASID is not a PASID, it is just a page table.
> 
> A generic IOMMU matches on either RID or (RID,PASID), so you should
> specify the PASID when establishing the match.
> 
> IOASID only specifies the page table.
> 
> So you read the above as configuring the path
> 
>   PCI_DEVICE -> (RID,PASID) -> nested_ioasid -> gpa_ioasid_id -> physical
> 
> Where (RID,PASID) indicate values taken from the PCI packet.
> 
> In principle the IOMMU could also be commanded to reuse the same
> ioasid page table with a different PASID:
> 
>   PCI_DEVICE_B -> (RID_B,PASID_B) -> nested_ioasid -> gpa_ioasid_id -> 
> physical
> 
> This is impossible if the ioasid == PASID in the API.
> 
>> Below I list different scenarios for ATTACH_IOASID in my view. Here 
>> vfio_device could be a real PCI function (RID), or a subfunction device 
>> (RID+def_ioasid). 
> 
> What is RID+def_ioasid? The IOMMU does not match on IOASID's.
> 
> A subfunction device always need to use PASID, or an internal IOMMU,
> confused what you are trying to explain?
> 
>> If the whole PASID table is delegated to the guest in ARM case, the guest
>> can select its own PASIDs w/o telling the hypervisor. 
> 
> The hypervisor has to route the PASID's to the guest at some point - a
> guest can't just claim a PASID unilaterally, that would not be secure.
AFAIU On ARM the stage 2 table is uniquely defined per 

[PATCH] swiotlb: don't override user specified size in swiotlb_adjust_size

2021-04-29 Thread Christoph Hellwig
If the user already specified a swiotlb size on the command line,
swiotlb_adjust_size shoul not overwrite it.

Fixes: 2cbc2776efe4 ("swiotlb: remove swiotlb_nr_tbl")
Reported-by: Tom Lendacky 
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8635a57f88e9..8ca7d505d61c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -118,6 +118,8 @@ void __init swiotlb_adjust_size(unsigned long size)
 * architectures such as those supporting memory encryption to
 * adjust/expand SWIOTLB size for their use.
 */
+   if (default_nslabs != IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT)
+   return;
size = ALIGN(size, IO_TLB_SIZE);
default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu