Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-05 Thread Auger Eric
Hi Jason,

On 4/29/21 10:04 PM, Jason Gunthorpe wrote:
> On Thu, Apr 29, 2021 at 03:26:55PM +0200, Auger Eric wrote:
>> From the pseudo code,
>>
>>   gpa_ioasid_id = ioctl(ioasid_fd, CREATE_IOASID, ..)
>>   ioctl(ioasid_fd, SET_IOASID_PAGE_TABLES, ..)
>>
>> I fail to understand whether the SET_IOASID_PAGE_TABLES would apply to
>> the whole IOASIDs within /dev/ioasid or to a specific one.
> 
> Sorry, nearly every IOCTL would be scoped to a specific IOASID as one
> of the arguments.

OK thank you for the clarification.
> 
>> Also in subsequent emails when you talk about IOASID, is it the
>> ioasid_id, just to double check the terminology.
> 
> I am refering to IOASID as 'handle of the page table object inside the
> /dev/ioasid fd'. If that is equal to some HW value or not I think
> remains as decision point.
OK
> 
> Basically the fd has an xarray of 'struct [something] *' and the
> IOASID is index to that FD's private xarray. This is necessary to
> create proper security as even if we have global PASID numbers or
> something they still need to be isolated to only the FD that has
> been authorized access.
> 
>>>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>>>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
>> is the nested_ioasid the allocated PASID id or is it a complete
>> different object id.
> 
> It is the IOASID handle above.
ok as per the following emails and below comment IOASID and PASID are
different.The first would be a logic ID wgile the second the HW ID.

Thanks

Eric
> 
>>>
>>>// IOMMU will match on the device RID, no PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
>>>
>>>// IOMMU will match on the device RID and PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
>> here I see you pass a different pasid, so I guess they are different, in
>> which case you would need to have an allocator function for this pasid,
>> right?
> 
> Yes, the underlying HW ID (PASID or substream id or whatver) is
> something slightly different
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/22/21 2:10 PM, Jason Gunthorpe wrote:
> On Thu, Apr 22, 2021 at 08:34:32AM +, Tian, Kevin wrote:
> 
>> The shim layer could be considered as a new iommu backend in VFIO,
>> which connects VFIO iommu ops to the internal helpers in
>> drivers/ioasid.
> 
> It may be the best we can do because of SPAPR, but the ideal outcome
> should be to remove the entire pluggable IOMMU stuff from vfio
> entirely and have it only use /dev/ioasid
> 
> We should never add another pluggable IOMMU type to vfio - everything
> should be done through drives/iommu now that it is much more capable.
> 
>> Another tricky thing is that a container may be linked to multiple iommu
>> domains in VFIO, as devices in the container may locate behind different
>> IOMMUs with inconsistent capability (commit 1ef3e2bc). 
> 
> Frankly this sounds over complicated. I would think /dev/ioasid should
> select the IOMMU when the first device is joined, and all future joins
> must be compatible with the original IOMMU - ie there is only one set
> of IOMMU capabilities in a /dev/ioasid.
> 
> This means qemue might have multiple /dev/ioasid's if the system has
> multiple incompatible IOMMUs (is this actually a thing?) The platform
> should design its IOMMU domains to minimize the number of
> /dev/ioasid's required.
> 
> Is there a reason we need to share IOASID'd between completely
> divergance IOMMU implementations? I don't expect the HW should be able
> to physically share page tables??
> 
> That decision point alone might be the thing that just says we can't
> ever have /dev/vfio/vfio == /dev/ioasid
> 
>> Just to confirm. Above flow is for current map/unmap flavor as what
>> VFIO/vDPA do today. Later when nested translation is supported,
>> there is no need to detach gpa_ioasid_fd. Instead, a new cmd will
>> be introduced to nest rid_ioasid_fd on top of gpa_ioasid_fd:
> 
> Sure.. The tricky bit will be to define both of the common nested
> operating modes.
>

>From the pseudo code,

  gpa_ioasid_id = ioctl(ioasid_fd, CREATE_IOASID, ..)
  ioctl(ioasid_fd, SET_IOASID_PAGE_TABLES, ..)

I fail to understand whether the SET_IOASID_PAGE_TABLES would apply to
the whole IOASIDs within /dev/ioasid or to a specific one.

Also in subsequent emails when you talk about IOASID, is it the
ioasid_id, just to double check the terminology.


>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
is the nested_ioasid the allocated PASID id or is it a complete
different object id.
> 
>// IOMMU will match on the device RID, no PASID:
>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
> 
>// IOMMU will match on the device RID and PASID:
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
here I see you pass a different pasid, so I guess they are different, in
which case you would need to have an allocator function for this pasid,
right?

Thanks

Eric
> 
> Notice that ATTACH (or bind, whatever) is always done on the
> vfio_device FD. ATTACH tells the IOMMU HW to link the PCI BDF to
> a specific page table defined by an IOASID.
> 
> I expect we have many flavours of IOASID tables, eg we have normal,
> and 'nested with table controlled by hypervisor'. ARM has 'nested with
> table controlled by guest' right? So like this?
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_DELGATED_IOASID,
>gpa_ioasid_id, )
>   // PASID now goes to 
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);

> 
> Where  is some internal to the guest handle of the viommu
> page table scoped within gpa_ioasid_id? Like maybe it is GPA of the
> base of the page table?
> 
> The guest can't select its own PASIDs without telling the hypervisor,
> right?
> 
>> I also feel hiding group from uAPI is a good thing and is interested in
>> the rationale behind for explicitly managing group in vfio (which is
>> essentially the same boundary as provided by iommu group), e.g. for 
>> better user experience when group security is broken? 
> 
> Indeed, I can see how things might have just evolved into this, but if
> it has a purpose it seems pretty hidden.
> we need it or not seems pretty hidden.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/22/21 2:10 PM, Jason Gunthorpe wrote:
> On Thu, Apr 22, 2021 at 08:34:32AM +, Tian, Kevin wrote:
> 
>> The shim layer could be considered as a new iommu backend in VFIO,
>> which connects VFIO iommu ops to the internal helpers in
>> drivers/ioasid.
> 
> It may be the best we can do because of SPAPR, but the ideal outcome
> should be to remove the entire pluggable IOMMU stuff from vfio
> entirely and have it only use /dev/ioasid
> 
> We should never add another pluggable IOMMU type to vfio - everything
> should be done through drives/iommu now that it is much more capable.
> 
>> Another tricky thing is that a container may be linked to multiple iommu
>> domains in VFIO, as devices in the container may locate behind different
>> IOMMUs with inconsistent capability (commit 1ef3e2bc). 
> 
> Frankly this sounds over complicated. I would think /dev/ioasid should
> select the IOMMU when the first device is joined, and all future joins
> must be compatible with the original IOMMU - ie there is only one set
> of IOMMU capabilities in a /dev/ioasid.
> 
> This means qemue might have multiple /dev/ioasid's if the system has
> multiple incompatible IOMMUs (is this actually a thing?) The platform
> should design its IOMMU domains to minimize the number of
> /dev/ioasid's required.
> 
> Is there a reason we need to share IOASID'd between completely
> divergance IOMMU implementations? I don't expect the HW should be able
> to physically share page tables??
> 
> That decision point alone might be the thing that just says we can't
> ever have /dev/vfio/vfio == /dev/ioasid
> 
>> Just to confirm. Above flow is for current map/unmap flavor as what
>> VFIO/vDPA do today. Later when nested translation is supported,
>> there is no need to detach gpa_ioasid_fd. Instead, a new cmd will
>> be introduced to nest rid_ioasid_fd on top of gpa_ioasid_fd:
> 
> Sure.. The tricky bit will be to define both of the common nested
> operating modes.
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
> 
>// IOMMU will match on the device RID, no PASID:
>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
> 
>// IOMMU will match on the device RID and PASID:
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
> 
> Notice that ATTACH (or bind, whatever) is always done on the
> vfio_device FD. ATTACH tells the IOMMU HW to link the PCI BDF to
> a specific page table defined by an IOASID.
> 
> I expect we have many flavours of IOASID tables, eg we have normal,
> and 'nested with table controlled by hypervisor'. ARM has 'nested with
> table controlled by guest' right? So like this?
yes the PASID table is fully controlled by the guest Same for the stage
1 table.
> 
>   nested_ioasid = ioctl(ioasid_fd, CREATE_DELGATED_IOASID,
>gpa_ioasid_id, )
>   // PASID now goes to 
>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
> 
> Where  is some internal to the guest handle of the viommu
> page table scoped within gpa_ioasid_id? Like maybe it is GPA of the
> base of the page table?
Yes the GPA of the first level page table + some misc info like the max
number of IOASIDs.
> 
> The guest can't select its own PASIDs without telling the hypervisor,
> right?
on ARM there is no system wide IOASID allocator as for x86. So the guest
can select its own PASID without telling the hyp.

Thanks

Eric
> 
>> I also feel hiding group from uAPI is a good thing and is interested in
>> the rationale behind for explicitly managing group in vfio (which is
>> essentially the same boundary as provided by iommu group), e.g. for 
>> better user experience when group security is broken? 
> 
> Indeed, I can see how things might have just evolved into this, but if
> it has a purpose it seems pretty hidden.
> we need it or not seems pretty hidden.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-29 Thread Auger Eric
Hi,

On 4/23/21 1:49 PM, Jason Gunthorpe wrote:
> On Fri, Apr 23, 2021 at 09:06:44AM +, Tian, Kevin wrote:
> 
>> Or could we still have just one /dev/ioasid but allow userspace to create
>> multiple gpa_ioasid_id's each associated to a different iommu domain? 
>> Then the compatibility check will be done at ATTACH_IOASID instead of 
>> JOIN_IOASID_FD.
> 
> To my mind what makes sense that that /dev/ioasid presents a single
> IOMMU behavior that is basically the same. This may ultimately not be
> what we call a domain today.
> 
> We may end up with a middle object which is a group of domains that
> all have the same capabilities, and we define capabilities in a way
> that most platforms have a single group of domains.
> 
> The key capability of a group of domains is they can all share the HW
> page table representation, so if an IOASID instantiates a page table
> it can be assigned to any device on any domain in the gruop of domains.
> 
> If you try to say that /dev/ioasid has many domains and they can't
> have their HW page tables shared then I think the implementation
> complexity will explode.
> 
>> This does impose one burden to userspace though, to understand the 
>> IOMMU compatibilities and figure out which incompatible features may
>> affect the page table management (while such knowledge is IOMMU
>> vendor specific) and then explicitly manage multiple /dev/ioasid's or 
>> multiple gpa_ioasid_id's.
> 
> Right, this seems very hard in the general case..
>  
>> Alternatively is it a good design by having the kernel return error at
>> attach/join time to indicate that incompatibility is detected then the 
>> userspace should open a new /dev/ioasid or creates a new gpa_ioasid_id
>> for the failing device upon such failure, w/o constructing its own 
>> compatibility knowledge?
> 
> Yes, this feels workable too
> 
>>> This means qemue might have multiple /dev/ioasid's if the system has
>>> multiple incompatible IOMMUs (is this actually a thing?) The platform
>>
>> One example is Intel platform with igd. Typically there is one IOMMU
>> dedicated for igd and the other IOMMU serving all the remaining devices.
>> The igd IOMMU may not support IOMMU_CACHE while the other one
>> does.
> 
> If we can do as above the two domains may be in the same group of
> domains and the IOMMU_CACHE is not exposed at the /dev/ioasid level.
> 
> For instance the API could specifiy IOMMU_CACHE during attach, not
> during IOASID creation.
> 
> Getting all the data model right in the API is going to be trickiest
> part of this.
> 
>> yes, e.g. in vSVA both devices (behind divergence IOMMUs) are bound
>> to a single guest process which has an unique PASID and 1st-level page
>> table. Earlier incompatibility example is only for 2nd-level.
> 
> Because when we get to here, things become inscrutable as an API if
> you are trying to say two different IOMMU presentations can actually
> be nested.
> 
>>> Sure.. The tricky bit will be to define both of the common nested
>>> operating modes.
>>>
>>>   nested_ioasid = ioctl(ioasid_fd, CREATE_NESTED_IOASID,  gpa_ioasid_id);
>>>   ioctl(ioasid_fd, SET_NESTED_IOASID_PAGE_TABLES, nested_ioasid, ..)
>>>
>>>// IOMMU will match on the device RID, no PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID, nested_ioasid);
>>>
>>>// IOMMU will match on the device RID and PASID:
>>>   ioctl(vfio_device, ATTACH_IOASID_PASID, pasid, nested_ioasid);
>>
>> I'm a bit confused here why we have both pasid and ioasid notations together.
>> Why not use nested_ioasid as pasid directly (i.e. every pasid in nested mode
>> is created by CREATE_NESTED_IOASID)?
> 
> The IOASID is not a PASID, it is just a page table.
> 
> A generic IOMMU matches on either RID or (RID,PASID), so you should
> specify the PASID when establishing the match.
> 
> IOASID only specifies the page table.
> 
> So you read the above as configuring the path
> 
>   PCI_DEVICE -> (RID,PASID) -> nested_ioasid -> gpa_ioasid_id -> physical
> 
> Where (RID,PASID) indicate values taken from the PCI packet.
> 
> In principle the IOMMU could also be commanded to reuse the same
> ioasid page table with a different PASID:
> 
>   PCI_DEVICE_B -> (RID_B,PASID_B) -> nested_ioasid -> gpa_ioasid_id -> 
> physical
> 
> This is impossible if the ioasid == PASID in the API.
> 
>> Below I list different scenarios for ATTACH_IOASID in my view. Here 
>> vfio_device could be a real PCI function (RID), or a subfunction device 
>> (RID+def_ioasid). 
> 
> What is RID+def_ioasid? The IOMMU does not match on IOASID's.
> 
> A subfunction device always need to use PASID, or an internal IOMMU,
> confused what you are trying to explain?
> 
>> If the whole PASID table is delegated to the guest in ARM case, the guest
>> can select its own PASIDs w/o telling the hypervisor. 
> 
> The hypervisor has to route the PASID's to the guest at some point - a
> guest can't just claim a PASID unilaterally, that would not be secure.
AFAIU On ARM the stage 2 table is uniquely defined per 

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi Jason,

On 4/16/21 4:34 PM, Jason Gunthorpe wrote:
> On Fri, Apr 16, 2021 at 04:26:19PM +0200, Auger Eric wrote:
> 
>> This was largely done during several confs including plumber, KVM forum,
>> for several years. Also API docs were shared on the ML. I don't remember
>> any voice was raised at those moments.
> 
> I don't think anyone objects to the high level ideas, but
> implementation does matter. I don't think anyone presented "hey we
> will tunnel an uAPI through VFIO to the IOMMU subsystem" - did they?

At minimum
https://events19.linuxfoundation.cn/wp-content/uploads/2017/11/Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf

But most obviously everything is documented in
Documentation/userspace-api/iommu.rst where the VFIO tunneling is
clearly stated ;-)

But well let's work together to design a better and more elegant
solution then.

Thanks

Eric
> 
> Look at the fairly simple IMS situation, for example. This was
> presented at plumbers too, and the slides were great - but the
> implementation was too hacky. It required a major rework of the x86
> interrupt handling before it was OK.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi,
On 4/16/21 4:05 PM, Jason Gunthorpe wrote:
> On Fri, Apr 16, 2021 at 03:38:02PM +0200, Auger Eric wrote:
> 
>> The redesign requirement came pretty late in the development process.
>> The iommu user API is upstream for a while, the VFIO interfaces have
>> been submitted a long time ago and under review for a bunch of time.
>> Redesigning everything with a different API, undefined at this point, is
>> a major setback for our work and will have a large impact on the
>> introduction of features companies are looking forward, hence our
>> frustration.
> 
> I will answer both you and Jacob at once.
> 
> This is uAPI, once it is set it can never be changed.
> 
> The kernel process and philosophy is to invest heavily in uAPI
> development and review to converge on the best uAPI possible.
> 
> Many past submissions have take a long time to get this right, there
> are several high profile uAPI examples.
> 
> Do you think this case is so special, or the concerns so minor, that it
> should get to bypass all of the normal process?

That's not my intent to bypass any process. I am just trying to
understand what needs to be re-designed and for what use case.
> 
> Ask yourself, is anyone advocating for the current direction on
> technical merits alone?
> 
> Certainly the patches I last saw where completely disgusting from a
> uAPI design perspective.
> 
> It was against the development process to organize this work the way
> it was done. Merging a wack of dead code to the kernel to support a
> uAPI vision that was never clearly articulated was a big mistake.
> 
> Start from the beginning. Invest heavily in defining a high quality
> uAPI. Clearly describe the uAPI to all stake holders.
This was largely done during several confs including plumber, KVM forum,
for several years. Also API docs were shared on the ML. I don't remember
any voice was raised at those moments.

 Break up the
> implementation into patch series without dead code. Make the
> patches. Remove the dead code this group has already added.
> 
> None of this should be a surprise. The VDPA discussion and related
> "what is a mdev" over a year ago made it pretty clear VFIO is not the
> exclusive user of "IOMMU in userspace" and that places limits on what
> kind of uAPIs expansion it should experience going forward.
Maybe clear for you but most probably not for many other stakeholders.

Anyway I do not intend to further argue and I will be happy to learn
from you and work with you, Jacob, Liu and all other stakeholders to
define a better integration.

Thanks

Eric
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi Jason,

On 4/16/21 1:07 AM, Jason Gunthorpe wrote:
> On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote:
>> Hi Jason,
>>
>> On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
>>> On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
>>>
>>>> DMA page faults are delivered to root-complex via page request message and
>>>> it is per-device according to PCIe spec. Page request handling flow is:
>>>>
>>>> 1) iommu driver receives a page request from device
>>>> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
>>>>page and requested permissions etc.
>>>> 3) iommu driver triggers fault handler registered by device driver with
>>>>iommu_report_device_fault()
>>>
>>> This seems confused.
>>>
>>> The PASID should define how to handle the page fault, not the driver.
>>
>> In my series I don't use PASID at all. I am just enabling nested stage
>> and the guest uses a single context. I don't allocate any user PASID at
>> any point.
>>
>> When there is a fault at physical level (a stage 1 fault that concerns
>> the guest), this latter needs to be reported and injected into the
>> guest. The vfio pci driver registers a fault handler to the iommu layer
>> and in that fault handler it fills a circ bugger and triggers an eventfd
>> that is listened to by the VFIO-PCI QEMU device. this latter retrives
>> the faault from the mmapped circ buffer, it knowns which vIOMMU it is
>> attached to, and passes the fault to the vIOMMU.
>> Then the vIOMMU triggers and IRQ in the guest.
>>
>> We are reusing the existing concepts from VFIO, region, IRQ to do that.
>>
>> For that use case, would you also use /dev/ioasid?
> 
> /dev/ioasid could do all the things you described vfio-pci as doing,
> it can even do them the same way you just described.
> 
> Stated another way, do you plan to duplicate all of this code someday
> for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to
> platform devices, right?
vfio regions and IRQ related APIs are common user interfaces exposed by
all vfio drivers, including platform. Then the actual circular buffer
implementation details can be put in a common lib.

as for the thin vfio iommu wrappers, the ones you don't like, they are
implemented in type1 code.

Maybe the need for /dev/ioasid is more crying for PASID management but
for the nested use case, that's not obvious to me and in your different
replies, it was not crystal clear where the use case belongs to.

The redesign requirement came pretty late in the development process.
The iommu user API is upstream for a while, the VFIO interfaces have
been submitted a long time ago and under review for a bunch of time.
Redesigning everything with a different API, undefined at this point, is
a major setback for our work and will have a large impact on the
introduction of features companies are looking forward, hence our
frustration.

Thanks

Eric


> 
> I feel what you guys are struggling with is some choice in the iommu
> kernel APIs that cause the events to be delivered to the pci_device
> owner, not the PASID owner.
> 
> That feels solvable.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-15 Thread Auger Eric
Hi Jason,

On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
> 
>> DMA page faults are delivered to root-complex via page request message and
>> it is per-device according to PCIe spec. Page request handling flow is:
>>
>> 1) iommu driver receives a page request from device
>> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
>>page and requested permissions etc.
>> 3) iommu driver triggers fault handler registered by device driver with
>>iommu_report_device_fault()
> 
> This seems confused.
> 
> The PASID should define how to handle the page fault, not the driver.

In my series I don't use PASID at all. I am just enabling nested stage
and the guest uses a single context. I don't allocate any user PASID at
any point.

When there is a fault at physical level (a stage 1 fault that concerns
the guest), this latter needs to be reported and injected into the
guest. The vfio pci driver registers a fault handler to the iommu layer
and in that fault handler it fills a circ bugger and triggers an eventfd
that is listened to by the VFIO-PCI QEMU device. this latter retrives
the faault from the mmapped circ buffer, it knowns which vIOMMU it is
attached to, and passes the fault to the vIOMMU.
Then the vIOMMU triggers and IRQ in the guest.

We are reusing the existing concepts from VFIO, region, IRQ to do that.

For that use case, would you also use /dev/ioasid?

Thanks

Eric
> 
> I don't remember any device specific actions in ATS, so what is the
> driver supposed to do?
> 
>> 4) device driver's fault handler signals an event FD to notify userspace to
>>fetch the information about the page fault. If it's VM case, inject the
>>page fault to VM and let guest to solve it.
> 
> If the PASID is set to 'report page fault to userspace' then some
> event should come out of /dev/ioasid, or be reported to a linked
> eventfd, or whatever.
> 
> If the PASID is set to 'SVM' then the fault should be passed to
> handle_mm_fault
> 
> And so on.
> 
> Userspace chooses what happens based on how they configure the PASID
> through /dev/ioasid.
> 
> Why would a device driver get involved here?
> 
>> Eric has sent below series for the page fault reporting for VM with passthru
>> device.
>> https://lore.kernel.org/kvm/20210223210625.604517-5-eric.au...@redhat.com/
> 
> It certainly should not be in vfio pci. Everything using a PASID needs
> this infrastructure, VDPA, mdev, PCI, CXL, etc.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-15 Thread Auger Eric
Hi Shameer,

+ Jean-Philippe


On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> RFC v1 --> v2:
>  - Added a generic interface for IOMMU drivers to retrieve all the 
>    RMR info associated with a given IOMMU.
>  - SMMUv3 driver gets the RMR list during probe() and installs
>    bypass STEs for all the SIDs in the RMR list. This is to keep
>    the ongoing traffic alive(if any) during SMMUv3 reset. This is
>based on the suggestions received for v1 to take care of the
>EFI framebuffer use case. Only sanity tested for now.
>  - During the probe/attach device, SMMUv3 driver reserves any
>    RMR region associated with the device such that there is a unity
>    mapping for them in SMMU.
> ---    
> 
> The series adds support to IORT RMR nodes specified in IORT
> Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> ranges that are used by endpoints and require a unity mapping
> in SMMU.
> 
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these controllers
> make use of host memory for various caching related purposes and when
> SMMU is enabled the iMR firmware fails to access these memory regions
> as there is no mapping for them. IORT RMR provides a way for UEFI to
> describe and report these memory regions so that the kernel can make
> a unity mapping for these in SMMU.
> 
> RFC because, Patch #1 is to update the actbl2.h and should be done
> through acpica update. I have send out a pull request[1] for that.
> 
> Tests:
> 
> With a UEFI, that reports the RMR for the dev,
> 
> [16F0h 5872   1] Type : 06
> [16F1h 5873   2]   Length : 007C
> [16F3h 5875   1] Revision : 00
> [1038h 0056   2] Reserved : 
> [1038h 0056   2]   Identifier : 
> [16F8h 5880   4]Mapping Count : 0001
> [16FCh 5884   4]   Mapping Offset : 0040
> 
> [1700h 5888   4]Number of RMR Descriptors : 0002
> [1704h 5892   4]RMR Descriptor Offset : 0018
> 
> [1708h 5896   8]  Base Address of RMR : E640
> [1710h 5904   8]Length of RMR : 0010
> [1718h 5912   4] Reserved : 
> 
> [171Ch 5916   8]  Base Address of RMR : 27B0
> [1724h 5924   8]Length of RMR : 00C0
> [172Ch 5932   4] Reserved : 
> 
> [1730h 5936   4]   Input base : 
> [1734h 5940   4] ID Count : 0001
> [1738h 5944   4]  Output Base : 0003
> [173Ch 5948   4] Output Reference : 0064
> [1740h 5952   4]Flags (decoded below) : 0001
>Single Mapping : 1

Following Jean-Philippe's suggestion I have used your series for nested
stage SMMUv3 integration, ie. to simplify the MSI nested stage mapping.

Host allocates hIOVA -> physical doorbell (pDB) as it normally does for
VFIO device passthrough. IOVA Range is 0x800 - 0x810.

I expose this MIS IOVA range to the guest as an RMR and as a result
guest has a flat mapping for this range. As the physical device is
programmed with hIOVA we have the following mapping:

IOVAIPA  PA
hIOVA   -> hIOVA ->  pDB
S1   s2

This works.

The only weird thing is that I need to expose 256 RMRs due to the
'Single Mapping' mandatory flag. I need to have 1 RMR per potential SID
on the bus.

I will post a new version of SMMUv3 nested stage soon for people to test
& compare. Obviously this removes a bunch of code on both SMMU/VFIO and
QEMU code so I think this solution looks better overall.

Thanks

Eric
> ...
> 
> Without the series the RAID controller initialization fails as
> below,
> 
> ...
> [   12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes 
>   
> [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for 
> SCSI host 0   
>   
> [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready 
> state 
> [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault 
> code:0x1 subcode:0x0 func:megasas_transition_to_ready 
>
> [  106.695186] megaraid_sas :03:00.0: System Register set:
>   
> [  106.889787] megaraid_sas :03:00.0: Failed to transition controller to 
> ready for scsi0.  
>  
> [  106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407
>   
> estuary:/$
> 
> With the series, now the kernel has direct mapping for the dev as
> below,
> 
> 

Re: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing

2021-04-15 Thread Auger Eric
Hi Shameer,
On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> Add support for parsing RMR node information from ACPI.
> Find associated stream ids and smmu node info from the
> RMR node and populate a linked list with RMR memory
> descriptors.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c | 122 +-
>  1 file changed, 121 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 9929ff50c0c0..a9705aa35028 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -40,6 +40,25 @@ struct iort_fwnode {
>  static LIST_HEAD(iort_fwnode_list);
>  static DEFINE_SPINLOCK(iort_fwnode_lock);
>  
> +struct iort_rmr_id {
> + u32  sid;
> + struct acpi_iort_node *smmu;
> +};
> +
> +/*
> + * One entry for IORT RMR.
> + */
> +struct iort_rmr_entry {
> + struct list_head list;
> +
> + unsigned int rmr_ids_num;
> + struct iort_rmr_id *rmr_ids;
> +
> + struct acpi_iort_rmr_desc *rmr_desc;
> +};
> +
> +static LIST_HEAD(iort_rmr_list); /* list of RMR regions from ACPI */
> +
>  /**
>   * iort_set_fwnode() - Create iort_fwnode and use it to register
>   *  iommu data in the iort_fwnode_list
> @@ -393,7 +412,8 @@ static struct acpi_iort_node *iort_node_get_id(struct 
> acpi_iort_node *node,
>   if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
>   node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
>   node->type == ACPI_IORT_NODE_SMMU_V3 ||
> - node->type == ACPI_IORT_NODE_PMCG) {
> + node->type == ACPI_IORT_NODE_PMCG ||
> + node->type == ACPI_IORT_NODE_RMR) {
>   *id_out = map->output_base;
>   return parent;
>   }
> @@ -1647,6 +1667,103 @@ static void __init iort_enable_acs(struct 
> acpi_iort_node *iort_node)
>  #else
>  static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { }
>  #endif
> +static int iort_rmr_desc_valid(struct acpi_iort_rmr_desc *desc)
> +{
> + struct iort_rmr_entry *e;
> + u64 end, start = desc->base_address, length = desc->length;
> +
> + if (!IS_ALIGNED(start, SZ_64K) || !IS_ALIGNED(length, SZ_64K))
> + return -EINVAL;
> +
> + end = start + length - 1;
> +
> + /* Check for address overlap */
I don't get this check. What is the problem if you attach the same range
to different stream ids. Shouldn't you check there is no overlap for the
same sid?


> + list_for_each_entry(e, _rmr_list, list) {
> + u64 e_start = e->rmr_desc->base_address;
> + u64 e_end = e_start + e->rmr_desc->length - 1;
> +
> + if (start <= e_end && end >= e_start)
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int __init iort_parse_rmr(struct acpi_iort_node *iort_node)
> +{
> + struct iort_rmr_id *rmr_ids, *ids;
> + struct iort_rmr_entry *e;
> + struct acpi_iort_rmr *rmr;
> + struct acpi_iort_rmr_desc *rmr_desc;
> + u32 map_count = iort_node->mapping_count;
> + int i, ret = 0, desc_count = 0;
> +
> + if (iort_node->type != ACPI_IORT_NODE_RMR)
> + return 0;
> +
> + if (!iort_node->mapping_offset || !map_count) {
> + pr_err(FW_BUG "Invalid ID mapping, skipping RMR node %p\n",
> +iort_node);
> + return -EINVAL;
> + }
> +
> + rmr_ids = kmalloc(sizeof(*rmr_ids) * map_count, GFP_KERNEL);
> + if (!rmr_ids)
> + return -ENOMEM;
> +
> + /* Retrieve associated smmu and stream id */
> + ids = rmr_ids;
nit: do you need both rmr_ids and ids?
> + for (i = 0; i < map_count; i++, ids++) {
> + ids->smmu = iort_node_get_id(iort_node, >sid, i);
> + if (!ids->smmu) {
> + pr_err(FW_BUG "Invalid SMMU reference, skipping RMR 
> node %p\n",
> +iort_node);
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + /* Retrieve RMR data */
> + rmr = (struct acpi_iort_rmr *)iort_node->node_data;
> + if (!rmr->rmr_offset || !rmr->rmr_count) {
> + pr_err(FW_BUG "Invalid RMR descriptor array, skipping RMR node 
> %p\n",
> +iort_node);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + rmr_desc = ACPI_ADD_PTR(struct acpi_iort_rmr_desc, iort_node,
> + rmr->rmr_offset);
> +
> + for (i = 0; i < rmr->rmr_count; i++, rmr_desc++) {
> + ret = iort_rmr_desc_valid(rmr_desc);
> + if (ret) {
> + pr_err(FW_BUG "Invalid RMR descriptor[%d] for node %p, 
> skipping...\n",
> +i, iort_node);
> + goto out;
so I understand you skip the whole node and not just that rmr desc,
otherwise you would continue. so in 

Re: [RFC PATCH v2 1/8] ACPICA: IORT: Update for revision E

2021-04-15 Thread Auger Eric
Hi Shameer,

On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> IORT revision E contains a few additions like,
>     -Added an identifier field in the node descriptors to aid table
>      cross-referencing.
>     -Introduced the Reserved Memory Range(RMR) node. This is used
>      to describe memory ranges that are used by endpoints and requires
>      a unity mapping in SMMU.
> -Introduced a flag in the RC node to express support for PRI.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  include/acpi/actbl2.h | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index ec66779cb193..274fce7b5c01 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -68,7 +68,7 @@
>   * IORT - IO Remapping Table
>   *
>   * Conforms to "IO Remapping Table System Software on ARM Platforms",
> - * Document number: ARM DEN 0049D, March 2018
> + * Document number: ARM DEN 0049E, June 2020
>   *
>   
> **/
>  
> @@ -86,7 +86,8 @@ struct acpi_iort_node {
>   u8 type;
>   u16 length;
>   u8 revision;
> - u32 reserved;
> + u16 reserved;
> + u16 identifier;
>   u32 mapping_count;
>   u32 mapping_offset;
>   char node_data[1];
> @@ -100,7 +101,8 @@ enum acpi_iort_node_type {
>   ACPI_IORT_NODE_PCI_ROOT_COMPLEX = 0x02,
>   ACPI_IORT_NODE_SMMU = 0x03,
>   ACPI_IORT_NODE_SMMU_V3 = 0x04,
> - ACPI_IORT_NODE_PMCG = 0x05
> + ACPI_IORT_NODE_PMCG = 0x05,
> + ACPI_IORT_NODE_RMR = 0x06,
>  };
>  
>  struct acpi_iort_id_mapping {
> @@ -167,10 +169,10 @@ struct acpi_iort_root_complex {
>   u8 reserved[3]; /* Reserved, must be zero */
>  };
>  
> -/* Values for ats_attribute field above */
> +/* Masks for ats_attribute field above */
>  
> -#define ACPI_IORT_ATS_SUPPORTED 0x0001   /* The root complex 
> supports ATS */
> -#define ACPI_IORT_ATS_UNSUPPORTED   0x   /* The root complex 
> doesn't support ATS */
> +#define ACPI_IORT_ATS_SUPPORTED (1)  /* The root complex supports 
> ATS */
> +#define ACPI_IORT_PRI_SUPPORTED (1<<1)   /* The root complex 
> supports PRI */
>  
>  struct acpi_iort_smmu {
>   u64 base_address;   /* SMMU base address */
> @@ -241,6 +243,17 @@ struct acpi_iort_pmcg {
>   u64 page1_base_address;
>  };
>  
> +struct acpi_iort_rmr {
so indeed in E.b there is a new field here.
u32 flags
> + u32 rmr_count;
> + u32 rmr_offset;
> +};
> +
> +struct acpi_iort_rmr_desc {
> + u64 base_address;
> + u64 length;
> + u32 reserved;
> +};
> +
>  
> /***
>   *
>   * IVRS - I/O Virtualization Reporting Structure
> 
Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v12 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-04-11 Thread Auger Eric
Hi Zenghui,

On 4/7/21 11:33 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 5:06, Eric Auger wrote:
>> +/*
>> + * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
>> + *    struct vfio_iommu_type1_set_pasid_table)
>> + *
>> + * The SET operation passes a PASID table to the host while the
>> + * UNSET operation detaches the one currently programmed. Setting
>> + * a table while another is already programmed replaces the old table.
> 
> It looks to me that this description doesn't match the IOMMU part.

Yep that's misleanding.

I replaced it by:

 It is allowed to "SET" the table several times without un-setting as
 long as the table config does not stay IOMMU_PASID_CONFIG_TRANSLATE.

> 
> [v14,05/13] iommu/smmuv3: Implement attach/detach_pasid_table
> 
> |    case IOMMU_PASID_CONFIG_TRANSLATE:
> |    /* we do not support S1 <-> S1 transitions */
> |    if (smmu_domain->s1_cfg.set)
> |    goto out;
> 
> Maybe I've misread something?
> 
> 
> Thanks,
> Zenghui
> 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie

2021-04-10 Thread Auger Eric
Hi Zenghui,

On 4/7/21 9:39 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> Up to now, when the type was UNMANAGED, we used to
>> allocate IOVA pages within a reserved IOVA MSI range.
>>
>> If both the host and the guest are exposed with SMMUs, each
>> would allocate an IOVA. The guest allocates an IOVA (gIOVA)
>> to map onto the guest MSI doorbell (gDB). The Host allocates
>> another IOVA (hIOVA) to map onto the physical doorbell (hDB).
>>
>> So we end up with 2 unrelated mappings, at S1 and S2:
>>   S1 S2
>> gIOVA    -> gDB
>>     hIOVA    ->    hDB
>>
>> The PCI device would be programmed with hIOVA.
>> No stage 1 mapping would existing, causing the MSIs to fault.
>>
>> iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
>> to the host so that gIOVA can be used by the host instead of
>> re-allocating a new hIOVA.
>>
>>   S1   S2
>> gIOVA    ->    gDB    ->    hDB
>>
>> this time, the PCI device can be programmed with the gIOVA MSI
>> doorbell which is correctly mapped through both stages.
>>
>> Nested mode is not compatible with HW MSI regions as in that
>> case gDB and hDB should have a 1-1 mapping. This check will
>> be done when attaching each device to the IOMMU domain.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f659395e7959..d25eb7cecaa7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -19,6 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
> 
> Duplicated include.
sure
> 
>>   #include 
>>   #include 
>>   #include 
>> @@ -29,12 +30,15 @@
>>   struct iommu_dma_msi_page {
>>   struct list_head    list;
>>   dma_addr_t    iova;
>> +    dma_addr_t    gpa;
>>   phys_addr_t    phys;
>> +    size_t    s1_granule;
>>   };
>>     enum iommu_dma_cookie_type {
>>   IOMMU_DMA_IOVA_COOKIE,
>>   IOMMU_DMA_MSI_COOKIE,
>> +    IOMMU_DMA_NESTED_MSI_COOKIE,
>>   };
>>     struct iommu_dma_cookie {
>> @@ -46,6 +50,7 @@ struct iommu_dma_cookie {
>>   dma_addr_t    msi_iova;
> 
> msi_iova is unused in the nested mode, but we still set it to the start
> address of the RESV_SW_MSI region (in iommu_get_msi_cookie()), which
> looks a bit strange to me.
I agree with you
> 
>>   };
>>   struct list_head    msi_page_list;
>> +    spinlock_t    msi_lock;
> 
> Should msi_lock be grabbed everywhere msi_page_list is populated?
> Especially in iommu_dma_get_msi_page(), which can be invoked from the
> irqchip driver.
Yes I agree
> 
>>     /* Domain for flush queue callback; NULL if flush queue not in
>> use */
>>   struct iommu_domain    *fq_domain;
>> @@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum
>> iommu_dma_cookie_type type)
>>     cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>>   if (cookie) {
>> +    spin_lock_init(>msi_lock);
>>   INIT_LIST_HEAD(>msi_page_list);
>>   cookie->type = type;
>>   }
>> @@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>>    *
>>    * Users who manage their own IOVA allocation and do not want DMA
>> API support,
>>    * but would still like to take advantage of automatic MSI
>> remapping, can use
>> - * this to initialise their own domain appropriately. Users should
>> reserve a
>> + * this to initialise their own domain appropriately. Users may
>> reserve a
>>    * contiguous IOVA region, starting at @base, large enough to
>> accommodate the
>>    * number of PAGE_SIZE mappings necessary to cover every MSI
>> doorbell address
>> - * used by the devices attached to @domain.
>> + * used by the devices attached to @domain. The other way round is to
>> provide
>> + * usable iova pages through the iommu_dma_bind_doorbell API (nested
>> stages
> 
> s/iommu_dma_bind_doorbell/iommu_dma_bind_guest_msi/ ?
correct
> 
>> + * use case)
>>    */
>>   int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>>   {
>>   struct iommu_dma_cookie *cookie;
>> +    int nesting, ret;
>>     if (domain->type != IOMMU_DOMAIN_UNMANAGED)
>>   return -EINVAL;
>> @@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain
>> *domain, dma_addr_t base)
>>   if (domain->iova_cookie)
>>   return -EEXIST;
>>   -    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +    ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
> 
> Redundant space.
yep
> 
>> +    if (!ret && nesting)
>> +    cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
>> +    else
>> +    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +
>>   if (!cookie)
>>   return -ENOMEM;
>>   @@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>   {
>>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>   struct iommu_dma_msi_page *msi, *tmp;
>> +    bool s2_unmap = false;
>>     if (!cookie)

Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-09 Thread Auger Eric
Hi Shameer,

On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> RFC v1 --> v2:
>  - Added a generic interface for IOMMU drivers to retrieve all the 
>    RMR info associated with a given IOMMU.
>  - SMMUv3 driver gets the RMR list during probe() and installs
>    bypass STEs for all the SIDs in the RMR list. This is to keep
>    the ongoing traffic alive(if any) during SMMUv3 reset. This is
>based on the suggestions received for v1 to take care of the
>EFI framebuffer use case. Only sanity tested for now.
>  - During the probe/attach device, SMMUv3 driver reserves any
>    RMR region associated with the device such that there is a unity
>    mapping for them in SMMU.
> ---    
> 
> The series adds support to IORT RMR nodes specified in IORT
> Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> ranges that are used by endpoints and require a unity mapping
> in SMMU.
> 
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these controllers
> make use of host memory for various caching related purposes and when
> SMMU is enabled the iMR firmware fails to access these memory regions
> as there is no mapping for them. IORT RMR provides a way for UEFI to
> describe and report these memory regions so that the kernel can make
> a unity mapping for these in SMMU.
> 
> RFC because, Patch #1 is to update the actbl2.h and should be done
> through acpica update. I have send out a pull request[1] for that.

What is the state of this series? I wondered if I should consider using
it for nested SMMU to avoid handling nested binding for MSI, as
suggested by Jean. Are there any blocker?

Thanks

Eric
> 
> Tests:
> 
> With a UEFI, that reports the RMR for the dev,
> 
> [16F0h 5872   1] Type : 06
> [16F1h 5873   2]   Length : 007C
> [16F3h 5875   1] Revision : 00
> [1038h 0056   2] Reserved : 
> [1038h 0056   2]   Identifier : 
> [16F8h 5880   4]Mapping Count : 0001
> [16FCh 5884   4]   Mapping Offset : 0040
> 
> [1700h 5888   4]Number of RMR Descriptors : 0002
> [1704h 5892   4]RMR Descriptor Offset : 0018
> 
> [1708h 5896   8]  Base Address of RMR : E640
> [1710h 5904   8]Length of RMR : 0010
> [1718h 5912   4] Reserved : 
> 
> [171Ch 5916   8]  Base Address of RMR : 27B0
> [1724h 5924   8]Length of RMR : 00C0
> [172Ch 5932   4] Reserved : 
> 
> [1730h 5936   4]   Input base : 
> [1734h 5940   4] ID Count : 0001
> [1738h 5944   4]  Output Base : 0003
> [173Ch 5948   4] Output Reference : 0064
> [1740h 5952   4]Flags (decoded below) : 0001
>Single Mapping : 1
> ...
> 
> Without the series the RAID controller initialization fails as
> below,
> 
> ...
> [   12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes 
>   
> [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for 
> SCSI host 0   
>   
> [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready 
> state 
> [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault 
> code:0x1 subcode:0x0 func:megasas_transition_to_ready 
>
> [  106.695186] megaraid_sas :03:00.0: System Register set:
>   
> [  106.889787] megaraid_sas :03:00.0: Failed to transition controller to 
> ready for scsi0.  
>  
> [  106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407
>   
> estuary:/$
> 
> With the series, now the kernel has direct mapping for the dev as
> below,
> 
> estuary:/$ cat /sys/kernel/iommu_groups/0/reserved_regions
>   
> 0x0800 0x080f msi 
>   
> 0x27b0 0x286f direct  
>   
> 0xe640 0xe64f direct  
>   
> estuary:/$
> 
> 
> [   12.254318] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   12.739089] megaraid_sas :03:00.0: FW provided supportMaxExtLDs: 0 
>  max_lds: 32  
> 
> [   12.746628] megaraid_sas :03:00.0: controller type   : 

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-09 Thread Auger Eric
Hi Kunkun,

On 4/9/21 6:48 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/8 20:30, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> With nested stage support, soon we will need to invalidate
>>>> S1 contexts and ranges tagged with an unmanaged asid, this
>>>> latter being managed by the guest. So let's introduce 2 helpers
>>>> that allow to invalidate with externally managed ASIDs
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v13 -> v14
>>>> - Actually send the NH_ASID command (reported by Xingang Wang)
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
>>>> -
>>>>    1 file changed, 29 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 5579ec4fccc8..4c19a1114de4 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>>>> arm_smmu_domain *smmu_domain, int ssid,
>>>>    }
>>>>      /* IO_PGTABLE API */
>>>> -static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>>>> *smmu_domain,
>>>> +   int ext_asid)
>>>>    {
>>>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>>>    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>    struct arm_smmu_cmdq_ent cmd;
>>>>    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>     * insertion to guarantee those are observed before the TLBI.
>>>> Do be
>>>>     * careful, 007.
>>>>     */
>>>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>>>> +    cmd.tlbi.asid    = ext_asid;
>>>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>>>> +    arm_smmu_cmdq_issue_sync(smmu);
>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>    arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>>>    } else {
>>>>    cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>>>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>    arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>>>    }
>>>>    +static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +{
>>>> +    struct arm_smmu_domain *smmu_domain = cookie;
>>>> +
>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>>>> +}
>>>> +
>>>>    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>>     unsigned long iova, size_t size,
>>>>     size_t granule,
>>>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>>>> arm_smmu_cmdq_ent *cmd,
>>>>    arm_smmu_cmdq_batch_submit(smmu, );
>>>>    }
>>>>    
>>> Here is the part of code in __arm_smmu_tlb_inv_range():
>>>>  if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>>  /* Get the leaf page size */
>>>>  tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>>
>>>>  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>>  cmd->tlbi.tg = (tg - 10) / 2;
>>>>
>>>>  /* Determine what level the granule is at */
>>>>  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>>
>>>>  num_pages = size >> tg;
>>>>  }
>>> When pSMMU supports RIL, we get the leaf page size by
>>> __ffs(smmu_domain->
>>> domain.pgsize_bitmap). In nested mode, it is determined by host
>>> PAGE_SIZE. If
>>> the host kernel and guest kernel has different translation

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> With nested stage support, soon we will need to invalidate
>> S1 contexts and ranges tagged with an unmanaged asid, this
>> latter being managed by the guest. So let's introduce 2 helpers
>> that allow to invalidate with externally managed ASIDs
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14
>> - Actually send the NH_ASID command (reported by Xingang Wang)
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 -
>>   1 file changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5579ec4fccc8..4c19a1114de4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>   }
>>     /* IO_PGTABLE API */
>> -static void arm_smmu_tlb_inv_context(void *cookie)
>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>> *smmu_domain,
>> +   int ext_asid)
>>   {
>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>   struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   struct arm_smmu_cmdq_ent cmd;
>>   @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>    * insertion to guarantee those are observed before the TLBI. Do be
>>    * careful, 007.
>>    */
>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>> +    arm_smmu_cmdq_issue_sync(smmu);
>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>   arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>   } else {
>>   cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>   arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>   }
>>   +static void arm_smmu_tlb_inv_context(void *cookie)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = cookie;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>> +}
>> +
>>   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>    unsigned long iova, size_t size,
>>    size_t granule,
>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>> arm_smmu_cmdq_ent *cmd,
>>   arm_smmu_cmdq_batch_submit(smmu, );
>>   }
>>   
> Here is the part of code in __arm_smmu_tlb_inv_range():
>>     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>     /* Get the leaf page size */
>>     tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>
>>     /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>     cmd->tlbi.tg = (tg - 10) / 2;
>>
>>     /* Determine what level the granule is at */
>>     cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>
>>     num_pages = size >> tg;
>>     }
> When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
> domain.pgsize_bitmap). In nested mode, it is determined by host
> PAGE_SIZE. If
> the host kernel and guest kernel has different translation granule (e.g.
> host 16K,
> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.
> 
> Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/

In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

> 
> Best Regards,
> Kunkun Jiang
>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>> size,
>> -  size_t granule, bool leaf,
>> -  struct arm_smmu_domain *smmu_domain)
>> +static void
>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>> +  size_t granule, bool leaf, int ext_asid,
>> +  struct arm_smmu_domain *smmu_domain)
>>   {
>>   struct arm_smmu_cmdq_ent cmd = {
>>   .tlbi = {
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> 

Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Shameer,
On 4/1/21 2:38 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 01 April 2021 12:49
>> To: yuzenghui 
>> Cc: eric.auger@gmail.com; iommu@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
>> kvm...@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
>> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
>> t...@semihalf.com; zhukeqian ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; wangxingang
>> ; jiangkunkun ;
>> jean-phili...@linaro.org; zhangfei@linaro.org; zhangfei@gmail.com;
>> vivek.gau...@arm.com; Shameerali Kolothum Thodi
>> ; nicoleots...@gmail.com;
>> lushenming ; vse...@nvidia.com; Wanghaibin (D)
>> 
>> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
>> one context descriptor
>>
>> Hi Zenghui,
>>
>> On 3/30/21 11:23 AM, Zenghui Yu wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> In preparation for vSVA, let's accept userspace provided configs
>>>> with more than one CD. We check the max CD against the host iommu
>>>> capability and also the format (linear versus 2 level).
>>>>
>>>> Signed-off-by: Eric Auger 
>>>> Signed-off-by: Shameer Kolothum
>> 
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 332d31c0680f..ab74a0289893 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -3038,14 +3038,17 @@ static int
>> arm_smmu_attach_pasid_table(struct
>>>> iommu_domain *domain,
>>>>   if (smmu_domain->s1_cfg.set)
>>>>   goto out;
>>>>   -    /*
>>>> - * we currently support a single CD so s1fmt and s1dss
>>>> - * fields are also ignored
>>>> - */
>>>> -    if (cfg->pasid_bits)
>>>> +    list_for_each_entry(master, _domain->devices,
>>>> domain_head) {
>>>> +    if (cfg->pasid_bits > master->ssid_bits)
>>>> +    goto out;
>>>> +    }
>>>> +    if (cfg->vendor_data.smmuv3.s1fmt ==
>>>> STRTAB_STE_0_S1FMT_64K_L2 &&
>>>> +    !(smmu->features &
>> ARM_SMMU_FEAT_2_LVL_CDTAB))
>>>>   goto out;
>>>>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>>>> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>>>> +    smmu_domain->s1_cfg.s1fmt =
>> cfg->vendor_data.smmuv3.s1fmt;
>>>
>>> And what about the SIDSS field?
>>>
>> I added this patch upon Shameer's request, to be more vSVA friendly.
>> Hower this series does not really target multiple CD support. At the
>> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
>> At this moment maybe I can only check the s1dss field is 0x2. Or simply
>> removes this patch?
>>
>> Thoughts?
> 
> Right. This was useful for vSVA tests. But yes, to properly support multiple 
> CDs
> we need to pass the S1DSS from Qemu. And that requires further changes.
> So I think it's better to remove this patch and reject S1CDMAX != 0 cases.
OK I will remove it

Thanks

Eric
> 
> Thanks,
> Shameer
>
>>
>> Eric
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 4/1/21 8:11 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
>> *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +    return -EINVAL;
>> +
>> +    if (!smmu)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
> 
> I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
> for guest and invalidate the stale CD entries on the physical side. Is
> PASID-cache type designed for that effect?
Yes it is. PASID-cache matches the CD table.
> 
>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +    return -ENOENT;
>> +    }
>> +
>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +    return -EINVAL;
>> +
>> +    /* IOTLB invalidation */
>> +
>> +    switch (inv_info->granularity) {
>> +    case IOMMU_INV_GRANU_PASID:
>> +    {
>> +    struct iommu_inv_pasid_info *info =
>> +    _info->granu.pasid_info;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +    if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +    return -EINVAL;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +    return 0;
>> +    }
>> +    case IOMMU_INV_GRANU_ADDR:
>> +    {
>> +    struct iommu_inv_addr_info *info = _info->granu.addr_info;
>> +    size_t size = info->nb_granules * info->granule_size;
>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +
>> +    if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +    break;
>> +
>> +    arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +  info->granule_size, leaf,
>> +  info->archid, smmu_domain);
>> +
>> +    arm_smmu_cmdq_issue_sync(smmu);
> 
> There is no need to issue one more SYNC.
Hum yes I did not notice it was made by the arm_smmu_cmdq_issue_cmdlist()

Thanks!

Eric
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> In preparation for vSVA, let's accept userspace provided configs
>> with more than one CD. We check the max CD against the host iommu
>> capability and also the format (linear versus 2 level).
>>
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Shameer Kolothum 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 332d31c0680f..ab74a0289893 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct
>> iommu_domain *domain,
>>   if (smmu_domain->s1_cfg.set)
>>   goto out;
>>   -    /*
>> - * we currently support a single CD so s1fmt and s1dss
>> - * fields are also ignored
>> - */
>> -    if (cfg->pasid_bits)
>> +    list_for_each_entry(master, _domain->devices,
>> domain_head) {
>> +    if (cfg->pasid_bits > master->ssid_bits)
>> +    goto out;
>> +    }
>> +    if (cfg->vendor_data.smmuv3.s1fmt ==
>> STRTAB_STE_0_S1FMT_64K_L2 &&
>> +    !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>>   goto out;
>>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>> +    smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
> 
> And what about the SIDSS field?
> 
I added this patch upon Shameer's request, to be more vSVA friendly.
Hower this series does not really target multiple CD support. At the
moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
At this moment maybe I can only check the s1dss field is 0x2. Or simply
removes this patch?

Thoughts?

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:17 AM, Zenghui Yu wrote:
> On 2021/2/24 4:56, Eric Auger wrote:
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
> 
> If I understand it correctly, the true nested mode effectively gives us
> a *NS-EL1* StreamWorld. We should therefore use CMDQ_OP_TLBI_NH_VA to
> invalidate the stage-1 NS-EL1 entries, no matter E2H is selected or not.
> 

Yes at the moment you're right. Support for nested virt may induce some
changes here but we are not there. I will fix it and add a comment.
Thank you!

Best Regards

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-03-26 Thread Auger Eric
Hi Jean,

On 3/2/21 10:26 AM, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCIe PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked
> and the OS is given a chance to fix the page tables and retry the
> transaction.
> 
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler.
> If the fault is recoverable, it will call us back to terminate or
> continue the stall.
> 
> To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which
> initializes the fault queue for the device.
> 
> Tested-by: Zhangfei Gao 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  43 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  59 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 196 +-
>  3 files changed, 283 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 7b15b7580c6e..59af0bbd2f7b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -354,6 +354,13 @@
>  #define CMDQ_PRI_1_GRPID GENMASK_ULL(8, 0)
>  #define CMDQ_PRI_1_RESP  GENMASK_ULL(13, 12)
>  
> +#define CMDQ_RESUME_0_RESP_TERM  0UL
> +#define CMDQ_RESUME_0_RESP_RETRY 1UL
> +#define CMDQ_RESUME_0_RESP_ABORT 2UL
> +#define CMDQ_RESUME_0_RESP   GENMASK_ULL(13, 12)
> +#define CMDQ_RESUME_0_SIDGENMASK_ULL(63, 32)
> +#define CMDQ_RESUME_1_STAG   GENMASK_ULL(15, 0)
> +
>  #define CMDQ_SYNC_0_CS   GENMASK_ULL(13, 12)
>  #define CMDQ_SYNC_0_CS_NONE  0
>  #define CMDQ_SYNC_0_CS_IRQ   1
> @@ -370,6 +377,25 @@
>  
>  #define EVTQ_0_IDGENMASK_ULL(7, 0)
>  
> +#define EVT_ID_TRANSLATION_FAULT 0x10
> +#define EVT_ID_ADDR_SIZE_FAULT   0x11
> +#define EVT_ID_ACCESS_FAULT  0x12
> +#define EVT_ID_PERMISSION_FAULT  0x13
> +
> +#define EVTQ_0_SSV   (1UL << 11)
> +#define EVTQ_0_SSID  GENMASK_ULL(31, 12)
> +#define EVTQ_0_SID   GENMASK_ULL(63, 32)
> +#define EVTQ_1_STAG  GENMASK_ULL(15, 0)
> +#define EVTQ_1_STALL (1UL << 31)
> +#define EVTQ_1_PnU   (1UL << 33)
> +#define EVTQ_1_InD   (1UL << 34)
> +#define EVTQ_1_RnW   (1UL << 35)
> +#define EVTQ_1_S2(1UL << 39)
> +#define EVTQ_1_CLASS GENMASK_ULL(41, 40)
> +#define EVTQ_1_TT_READ   (1UL << 44)
> +#define EVTQ_2_ADDR  GENMASK_ULL(63, 0)
> +#define EVTQ_3_IPA   GENMASK_ULL(51, 12)
> +
>  /* PRI queue */
>  #define PRIQ_ENT_SZ_SHIFT4
>  #define PRIQ_ENT_DWORDS  ((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
> @@ -464,6 +490,13 @@ struct arm_smmu_cmdq_ent {
>   enum pri_resp   resp;
>   } pri;
>  
> + #define CMDQ_OP_RESUME  0x44
> + struct {
> + u32 sid;
> + u16 stag;
> + u8  resp;
> + } resume;
> +
>   #define CMDQ_OP_CMD_SYNC0x46
>   struct {
>   u64 msiaddr;
> @@ -522,6 +555,7 @@ struct arm_smmu_cmdq_batch {
>  
>  struct arm_smmu_evtq {
>   struct arm_smmu_queue   q;
> + struct iopf_queue   *iopf;
>   u32 max_stalls;
>  };
>  
> @@ -659,7 +693,9 @@ struct arm_smmu_master {
>   struct arm_smmu_stream  *streams;
>   unsigned intnum_streams;
>   boolats_enabled;
> + boolstall_enabled;
>   boolsva_enabled;
> + booliopf_enabled;
>   struct list_headbonds;
>   unsigned intssid_bits;
>  };
> @@ -678,6 +714,7 @@ struct arm_smmu_domain {
>  
>   struct io_pgtable_ops   *pgtbl_ops;
>   boolnon_strict;
> + boolstall_enabled;
>   atomic_tnr_ats_masters;
>  
>   enum arm_smmu_domain_stage  stage;
> @@ -719,6 +756,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
> *master);
>  bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
>  int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
>  int arm_smmu_master_disable_sva(struct 

Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-22 Thread Auger Eric
Hi Chenxiang,

On 3/22/21 7:40 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/3/20 1:36, Auger Eric 写道:
>> Hi Chenxiang,
>>
>> On 3/4/21 8:55 AM, chenxiang (M) wrote:
>>> Hi Eric,
>>>
>>>
>>> 在 2021/2/24 4:56, Eric Auger 写道:
>>>> Implement domain-selective, pasid selective and page-selective
>>>> IOTLB invalidations.
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v13 -> v14:
>>>> - Add domain invalidation
>>>> - do global inval when asid is not provided with addr
>>>>    granularity
>>>>
>>>> v7 -> v8:
>>>> - ASID based invalidation using iommu_inv_pasid_info
>>>> - check ARCHID/PASID flags in addr based invalidation
>>>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>>>
>>>> v6 -> v7
>>>> - check the uapi version
>>>>
>>>> v3 -> v4:
>>>> - adapt to changes in the uapi
>>>> - add support for leaf parameter
>>>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>>>    anymore
>>>>
>>>> v2 -> v3:
>>>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>>>
>>>> v1 -> v2:
>>>> - properly pass the asid
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74
>>>> +
>>>>   1 file changed, 74 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 4c19a1114de4..df3adc49111c 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -2949,6 +2949,79 @@ static void
>>>> arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>>>   mutex_unlock(_domain->init_mutex);
>>>>   }
>>>>   +static int
>>>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct
>>>> device *dev,
>>>> +  struct iommu_cache_invalidate_info *inv_info)
>>>> +{
>>>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>> +
>>>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (!smmu)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>>>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>>>> +    return -ENOENT;
>>>> +    }
>>>> +
>>>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>>>> +    return -EINVAL;
>>>> +
>>>> +    /* IOTLB invalidation */
>>>> +
>>>> +    switch (inv_info->granularity) {
>>>> +    case IOMMU_INV_GRANU_PASID:
>>>> +    {
>>>> +    struct iommu_inv_pasid_info *info =
>>>> +    _info->granu.pasid_info;
>>>> +
>>>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +    return -ENOENT;
>>>> +    if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>>>> +    return -EINVAL;
>>>> +
>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>>>> +    return 0;
>>>> +    }
>>>> +    case IOMMU_INV_GRANU_ADDR:
>>>> +    {
>>>> +    struct iommu_inv_addr_info *info = _info->granu.addr_info;
>>>> +    size_t size = info->nb_granules * info->granule_size;
>>>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>>>> +
>>>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +    return -ENOENT;
>>>> +
>>>> +    if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>>>> +    break;
>>>> +
>>>> +    arm_smmu_tlb_inv_range_domai

Re: [PATCH v13 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-03-19 Thread Auger Eric
Hi Jean,

On 3/2/21 10:26 AM, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCIe PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked
> and the OS is given a chance to fix the page tables and retry the
> transaction.
> 
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler.
> If the fault is recoverable, it will call us back to terminate or
> continue the stall.
> 
> To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which
> initializes the fault queue for the device.
> 
> Tested-by: Zhangfei Gao 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  43 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  59 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 196 +-
>  3 files changed, 283 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 7b15b7580c6e..59af0bbd2f7b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -354,6 +354,13 @@
>  #define CMDQ_PRI_1_GRPID GENMASK_ULL(8, 0)
>  #define CMDQ_PRI_1_RESP  GENMASK_ULL(13, 12)
>  
> +#define CMDQ_RESUME_0_RESP_TERM  0UL
> +#define CMDQ_RESUME_0_RESP_RETRY 1UL
> +#define CMDQ_RESUME_0_RESP_ABORT 2UL
> +#define CMDQ_RESUME_0_RESP   GENMASK_ULL(13, 12)
> +#define CMDQ_RESUME_0_SIDGENMASK_ULL(63, 32)
> +#define CMDQ_RESUME_1_STAG   GENMASK_ULL(15, 0)
> +
>  #define CMDQ_SYNC_0_CS   GENMASK_ULL(13, 12)
>  #define CMDQ_SYNC_0_CS_NONE  0
>  #define CMDQ_SYNC_0_CS_IRQ   1
> @@ -370,6 +377,25 @@
>  
>  #define EVTQ_0_IDGENMASK_ULL(7, 0)
>  
> +#define EVT_ID_TRANSLATION_FAULT 0x10
> +#define EVT_ID_ADDR_SIZE_FAULT   0x11
> +#define EVT_ID_ACCESS_FAULT  0x12
> +#define EVT_ID_PERMISSION_FAULT  0x13
> +
> +#define EVTQ_0_SSV   (1UL << 11)
> +#define EVTQ_0_SSID  GENMASK_ULL(31, 12)
> +#define EVTQ_0_SID   GENMASK_ULL(63, 32)
> +#define EVTQ_1_STAG  GENMASK_ULL(15, 0)
> +#define EVTQ_1_STALL (1UL << 31)
> +#define EVTQ_1_PnU   (1UL << 33)
> +#define EVTQ_1_InD   (1UL << 34)
> +#define EVTQ_1_RnW   (1UL << 35)
> +#define EVTQ_1_S2(1UL << 39)
> +#define EVTQ_1_CLASS GENMASK_ULL(41, 40)
> +#define EVTQ_1_TT_READ   (1UL << 44)
> +#define EVTQ_2_ADDR  GENMASK_ULL(63, 0)
> +#define EVTQ_3_IPA   GENMASK_ULL(51, 12)
> +
>  /* PRI queue */
>  #define PRIQ_ENT_SZ_SHIFT4
>  #define PRIQ_ENT_DWORDS  ((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
> @@ -464,6 +490,13 @@ struct arm_smmu_cmdq_ent {
>   enum pri_resp   resp;
>   } pri;
>  
> + #define CMDQ_OP_RESUME  0x44
> + struct {
> + u32 sid;
> + u16 stag;
> + u8  resp;
> + } resume;
> +
>   #define CMDQ_OP_CMD_SYNC0x46
>   struct {
>   u64 msiaddr;
> @@ -522,6 +555,7 @@ struct arm_smmu_cmdq_batch {
>  
>  struct arm_smmu_evtq {
>   struct arm_smmu_queue   q;
> + struct iopf_queue   *iopf;
>   u32 max_stalls;
>  };
>  
> @@ -659,7 +693,9 @@ struct arm_smmu_master {
>   struct arm_smmu_stream  *streams;
>   unsigned intnum_streams;
>   boolats_enabled;
> + boolstall_enabled;
>   boolsva_enabled;
> + booliopf_enabled;
>   struct list_headbonds;
>   unsigned intssid_bits;
>  };
> @@ -678,6 +714,7 @@ struct arm_smmu_domain {
>  
>   struct io_pgtable_ops   *pgtbl_ops;
>   boolnon_strict;
> + boolstall_enabled;
>   atomic_tnr_ats_masters;
>  
>   enum arm_smmu_domain_stage  stage;
> @@ -719,6 +756,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
> *master);
>  bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
>  int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
>  int arm_smmu_master_disable_sva(struct 

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-19 Thread Auger Eric
Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/2/24 4:56, Eric Auger 写道:
>> Implement domain-selective, pasid selective and page-selective
>> IOTLB invalidations.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14:
>> - Add domain invalidation
>> - do global inval when asid is not provided with addr
>>   granularity
>>
>> v7 -> v8:
>> - ASID based invalidation using iommu_inv_pasid_info
>> - check ARCHID/PASID flags in addr based invalidation
>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>
>> v6 -> v7
>> - check the uapi version
>>
>> v3 -> v4:
>> - adapt to changes in the uapi
>> - add support for leaf parameter
>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>   anymore
>>
>> v2 -> v3:
>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>
>> v1 -> v2:
>> - properly pass the asid
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +
>>  1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 4c19a1114de4..df3adc49111c 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct 
>> iommu_domain *domain)
>>  mutex_unlock(_domain->init_mutex);
>>  }
>>  
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +return -EINVAL;
>> +
>> +if (!smmu)
>> +return -EINVAL;
>> +
>> +if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +return -EINVAL;
>> +
>> +if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>> +inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +return -ENOENT;
>> +}
>> +
>> +if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +return -EINVAL;
>> +
>> +/* IOTLB invalidation */
>> +
>> +switch (inv_info->granularity) {
>> +case IOMMU_INV_GRANU_PASID:
>> +{
>> +struct iommu_inv_pasid_info *info =
>> +_info->granu.pasid_info;
>> +
>> +if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +return -ENOENT;
>> +if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +return -EINVAL;
>> +
>> +__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +return 0;
>> +}
>> +case IOMMU_INV_GRANU_ADDR:
>> +{
>> +struct iommu_inv_addr_info *info = _info->granu.addr_info;
>> +size_t size = info->nb_granules * info->granule_size;
>> +bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +return -ENOENT;
>> +
>> +if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +break;
>> +
>> +arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +  info->granule_size, leaf,
>> +  info->archid, smmu_domain);
> 
> Is it possible to add a check before the function to make sure that
> info->granule_size can be recognized by SMMU?
> There is a scenario which will cause TLBI issue: RIL feature is enabled
> on guest but is disabled on host, and then on
> host it just invalidate 4K/2M/1G once a time, but from QEMU,
> info->nb_granules is always 1 and info->granule_size = size,
> if size is not equal to 4K or 2M or 1G (for example size = granule_size
> is 5M), it will only part of the size it wants to invalidate.
> 
> I think maybe we can add a check here: if RIL is not enabled and also
> size is not the granule_size (4K/2M/1G) supported by
> SMMU hardware, can we just simply use the smallest granule_size
> supported by hardware all the time?
> 
>> +
>> +arm_smmu_cmdq_issue_sync(smmu);
>> +return 0;
>> +}
>> +case IOMMU_INV_GRANU_DOMAIN:
>> +break;
> 
> I check the qemu code
> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
> but it seems not set event.entry.granularity which i think it should set
> IOMMU_INV_GRAN_ADDR.
this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
set it explicitly ;-)
> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
> size = 0x1 on 48bit 

Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-19 Thread Auger Eric
Hi Krishna,

On 3/18/21 1:16 AM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy 
> 
> Validated nested translations with NVMe PCI device assigned to Guest VM. 
> Tested with both v12 and v13 of Jean-Philippe's patches as base.

Many thanks for that.
> 
>> This is based on Jean-Philippe's
>> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
>> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to 
> be resolved manually.

Yep I will respin accordingly.

Best Regards

Eric
> 
> -KR
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-19 Thread Auger Eric
Hi Keqian,

On 3/2/21 9:35 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> On detach, the stage 1 config is unset and the abort flag is
>> unset.
>>
>> Signed-off-by: Eric Auger 
>>
> [...]
> 
>> +
>> +/*
>> + * we currently support a single CD so s1fmt and s1dss
>> + * fields are also ignored
>> + */
>> +if (cfg->pasid_bits)
>> +goto out;
>> +
>> +smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
> specific cd using arm_smmu_get_cd_ptr().
> 
> Maybe we'd better use a specialized function to fill other fields of "cdcfg" 
> or add a sanity check in arm_smmu_get_cd_ptr()
> to prevent calling it under nested mode?
> 
> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
> found. Just a suggestion ;-)

forgive me for the delay. yes I can indeed make sure that code is not
called in nested mode. Please could you detail why you would need to
call arm_smmu_get_cd_ptr()?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
> 
>> +smmu_domain->s1_cfg.set = true;
>> +smmu_domain->abort = false;
>> +break;
>> +default:
>> +goto out;
>> +}
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +ret = 0;
>> +out:
>> +mutex_unlock(_domain->init_mutex);
>> +return ret;
>> +}
>> +
>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_master *master;
>> +unsigned long flags;
>> +
>> +mutex_lock(_domain->init_mutex);
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +goto unlock;
>> +
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = false;
>> +
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +
>> +unlock:
>> +mutex_unlock(_domain->init_mutex);
>> +}
>> +
>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>   enum iommu_dev_features feat)
>>  {
>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>  .of_xlate   = arm_smmu_of_xlate,
>>  .get_resv_regions   = arm_smmu_get_resv_regions,
>>  .put_resv_regions   = generic_iommu_put_resv_regions,
>> +.attach_pasid_table = arm_smmu_attach_pasid_table,
>> +.detach_pasid_table = arm_smmu_detach_pasid_table,
>>  .dev_has_feat   = arm_smmu_dev_has_feature,
>>  .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>>  .dev_enable_feat= arm_smmu_dev_enable_feature,
>>
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Add support for ACPI VIOT

2021-03-19 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> Add a driver for the ACPI VIOT table, which enables virtio-iommu on
> non-devicetree platforms, including x86. This series depends on the
> ACPICA changes of patch 1, which will be included in next release [1]
> and pulled into Linux.
> 
> The Virtual I/O Translation table (VIOT) describes the topology of
> para-virtual I/O translation devices and the endpoints they manage.
> It was recently approved for inclusion into the ACPI standard [2].
> A provisional version of the specification can be found at [3].
> 
> After discussing non-devicetree support for virtio-iommu at length
> [4][5][6] we concluded that it should use this new ACPI table. And for
> platforms that don't implement either devicetree or ACPI, a structure
> that uses roughly the same format [6] can be built into the device.
> 
> [1] https://github.com/acpica/acpica/pull/666
> [2] https://lore.kernel.org/linux-iommu/20210218233943.gh702...@redhat.com/
> [3] https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> [4] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> [5] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [6] 
> https://lore.kernel.org/linux-iommu/20200821131540.2801801-1-jean-phili...@linaro.org/

Do you have a qemu branch to share for us to start exercising different
kinds of topology?

Thanks

Eric
> 
> Jean-Philippe Brucker (3):
>   ACPICA: iASL: Add definitions for the VIOT table
>   ACPI: Add driver for the VIOT table
>   iommu/virtio: Enable x86 support
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   4 +-
>  drivers/acpi/Makefile|   2 +
>  include/acpi/actbl3.h|  67 ++
>  include/linux/acpi_viot.h|  26 +++
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |   6 +
>  drivers/acpi/viot.c  | 406 +++
>  drivers/iommu/virtio-iommu.c |   3 +
>  MAINTAINERS  |   8 +
>  10 files changed, 526 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] ACPI: Add driver for the VIOT table

2021-03-19 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> The ACPI Virtual I/O Translation Table describes topology of
> para-virtual platforms. For now it describes the relation between
> virtio-iommu and the endpoints it manages. Supporting that requires
> three steps:
> 
> (1) acpi_viot_init(): parse the VIOT table, build a list of endpoints
> and vIOMMUs.
> 
> (2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the
> device probed, register it to the VIOT driver. This step is required
> because unlike similar drivers, VIOT doesn't create the vIOMMU
> device.
> 
> (3) acpi_viot_dma_setup(): when the endpoint is initialized, find the
> vIOMMU and register the endpoint's DMA ops.
> 
> If step (3) happens before step (2), it is deferred until the IOMMU is
> initialized, then retried.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   1 +
>  drivers/acpi/Makefile|   2 +
>  include/linux/acpi_viot.h|  26 +++
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |   6 +
>  drivers/acpi/viot.c  | 406 +++
>  drivers/iommu/virtio-iommu.c |   3 +
>  MAINTAINERS  |   8 +
>  9 files changed, 457 insertions(+)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index eedec61e3476..3758c6940ed7 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -526,6 +526,9 @@ endif
>  
>  source "drivers/acpi/pmic/Kconfig"
>  
> +config ACPI_VIOT
> + bool
> +
>  endif# ACPI
>  
>  config X86_PM_TIMER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 192ef8f61310..2819b5c8ec30 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,6 +403,7 @@ config VIRTIO_IOMMU
>   depends on ARM64
>   select IOMMU_API
>   select INTERVAL_TREE
> + select ACPI_VIOT if ACPI
>   help
> Para-virtualised IOMMU driver with virtio.
>  
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 700b41adf2db..a6e644c48987 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -118,3 +118,5 @@ video-objs+= acpi_video.o 
> video_detect.o
>  obj-y+= dptf/
>  
>  obj-$(CONFIG_ARM64)  += arm64/
> +
> +obj-$(CONFIG_ACPI_VIOT)  += viot.o
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> new file mode 100644
> index ..1f5837595488
> --- /dev/null
> +++ b/include/linux/acpi_viot.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ACPI_VIOT_H__
> +#define __ACPI_VIOT_H__
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI_VIOT
> +void __init acpi_viot_init(void);
> +int acpi_viot_dma_setup(struct device *dev, enum dev_dma_attr attr);
> +int acpi_viot_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
> +#else
> +static inline void acpi_viot_init(void) {}
> +static inline int acpi_viot_dma_setup(struct device *dev,
> +   enum dev_dma_attr attr)
> +{
> + return 0;
> +}
> +static inline int acpi_viot_set_iommu_ops(struct device *dev,
> +   struct iommu_ops *ops)
> +{
> + return -ENODEV;
> +}
> +#endif
> +
> +#endif /* __ACPI_VIOT_H__ */
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index be7da23fad76..f9a965c6617e 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -27,6 +27,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1338,6 +1339,7 @@ static int __init acpi_init(void)
>  
>   pci_mmcfg_late_init();
>   acpi_iort_init();
> + acpi_viot_init();
>   acpi_scan_init();
>   acpi_ec_init();
>   acpi_debugfs_init();
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index a184529d8fa4..4af01fddd94c 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum 
> dev_dma_attr attr,
>  {
>   const struct iommu_ops *iommu;
>   u64 dma_addr = 0, size = 0;
> + int ret;
>  
>   if (attr == DEV_DMA_NOT_SUPPORTED) {
>   set_dma_ops(dev, _dummy_ops);
>   return 0;
>   }
>  
> + ret = acpi_viot_dma_setup(dev, attr);
> + if (ret)
> + return ret > 0 ? 0 : ret;
> +
>   iort_dma_setup(dev, _addr, );
>  
>   iommu = iort_iommu_configure_id(dev, input_id);
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> new file mode 100644
> index ..57a092e8551b
> --- /dev/null
> +++ b/drivers/acpi/viot.c
> @@ -0,0 +1,406 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual I/O 

Re: [PATCH 1/3] ACPICA: iASL: Add definitions for the VIOT table

2021-03-18 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> Just here for reference, don't merge!
> 
> The actual commits will be pulled from the next ACPICA release.
> I squashed the three relevant commits:
> 
> ACPICA commit fc4e33319c1ee08f20f5c44853dd8426643f6dfd
> ACPICA commit 2197e354fb5dcafaddd2016ffeb0620e5bc3d5e2
> ACPICA commit 856a96fdf4b51b2b8da17529df0255e6f51f1b5b
> 
> Link: https://github.com/acpica/acpica/commit/fc4e3331
> Link: https://github.com/acpica/acpica/commit/2197e354
> Link: https://github.com/acpica/acpica/commit/856a96fd
> Signed-off-by: Bob Moore 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  include/acpi/actbl3.h | 67 +++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/include/acpi/actbl3.h b/include/acpi/actbl3.h
> index df5f4b27f3aa..09d15898e9a8 100644
> --- a/include/acpi/actbl3.h
> +++ b/include/acpi/actbl3.h
> @@ -33,6 +33,7 @@
>  #define ACPI_SIG_TCPA   "TCPA"   /* Trusted Computing Platform 
> Alliance table */
>  #define ACPI_SIG_TPM2   "TPM2"   /* Trusted Platform Module 2.0 
> H/W interface table */
>  #define ACPI_SIG_UEFI   "UEFI"   /* Uefi Boot Optimization Table 
> */
> +#define ACPI_SIG_VIOT   "VIOT"   /* Virtual I/O Translation 
> Table */
>  #define ACPI_SIG_WAET   "WAET"   /* Windows ACPI Emulated 
> devices Table */
>  #define ACPI_SIG_WDAT   "WDAT"   /* Watchdog Action Table */
>  #define ACPI_SIG_WDDT   "WDDT"   /* Watchdog Timer Description 
> Table */
> @@ -483,6 +484,72 @@ struct acpi_table_uefi {
>   u16 data_offset;/* Offset of remaining data in table */
>  };
>  
> +/***
> + *
> + * VIOT - Virtual I/O Translation Table
> + *Version 1
For other tables I see
Conforms to ../.. Shouldn't we have such section too
> + *
> + 
> **/
> +
> +struct acpi_table_viot {
> + struct acpi_table_header header;/* Common ACPI table header */
> + u16 node_count;
> + u16 node_offset;
> + u8 reserved[8];
> +};
> +
> +/* VIOT subtable header */
> +
> +struct acpi_viot_header {
> + u8 type;
> + u8 reserved;
> + u16 length;
> +};
> +
> +/* Values for Type field above */
> +
> +enum acpi_viot_node_type {
> + ACPI_VIOT_NODE_PCI_RANGE = 0x01,
> + ACPI_VIOT_NODE_MMIO = 0x02,
> + ACPI_VIOT_NODE_VIRTIO_IOMMU_PCI = 0x03,
> + ACPI_VIOT_NODE_VIRTIO_IOMMU_MMIO = 0x04,
> + ACPI_VIOT_RESERVED = 0x05
> +};
> +
> +/* VIOT subtables */
> +
> +struct acpi_viot_pci_range {
> + struct acpi_viot_header header;
> + u32 endpoint_start;
> + u16 segment_start;
> + u16 segment_end;
> + u16 bdf_start;
> + u16 bdf_end;
> + u16 output_node;
> + u8 reserved[6];
> +};
> +
> +struct acpi_viot_mmio {
> + struct acpi_viot_header header;
> + u32 endpoint;
> + u64 base_address;
> + u16 output_node;
> + u8 reserved[6];
> +};
> +
> +struct acpi_viot_virtio_iommu_pci {
> + struct acpi_viot_header header;
> + u16 segment;
> + u16 bdf;
> + u8 reserved[8];
> +};
> +
> +struct acpi_viot_virtio_iommu_mmio {
> + struct acpi_viot_header header;
> + u8 reserved[4];
> + u64 base_address;
> +};
> +
>  
> /***
>   *
>   * WAET - Windows ACPI Emulated devices Table
> 

Besides
Reviewed-by: Eric Auger 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-16 Thread Auger Eric
Hi Krishna,
On 3/15/21 7:04 PM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy 
> 
>> 1) pass the guest stage 1 configuration
> 
> Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
> with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch 
> series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
> NVMe PCIe device is functional with 2-stage translations and no issues 
> observed.
Thank you very much for your testing efforts. For your info, there are
more recent kernel series:
[PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23)
[PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)

working along with QEMU RFC
[RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)

If you have cycles to test with those, this would be higly appreciated.

Thanks

Eric
> 
> -KR
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-15 Thread Auger Eric
Hi Christoph,

On 3/14/21 4:58 PM, Christoph Hellwig wrote:
> On Sun, Mar 14, 2021 at 11:44:52AM +0100, Auger Eric wrote:
>> As mentionned by Robin, there are series planning to use
>> DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu (ARM
>> and Intel):
>>
>> [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
>> patches 1, 2, 3
>>
>> Is the plan to introduce a new domain_get_nesting_info ops then?
> 
> The plan as usual would be to add it the series adding that support.
> Not sure what the merge plans are - if the series is ready to be
> merged I could rebase on top of it, otherwise that series will need
> to add the method.
OK I think your series may be upstreamed first.

Thanks

Eric
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-14 Thread Auger Eric
Hi Christoph,

On 3/1/21 9:42 AM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 40 ++---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 30 ++--
>  drivers/iommu/intel/iommu.c | 28 +--
>  drivers/iommu/iommu.c   |  8 +
>  drivers/vfio/vfio_iommu_type1.c |  5 +--
>  include/linux/iommu.h   |  4 ++-
>  6 files changed, 50 insertions(+), 65 deletions(-)

As mentionned by Robin, there are series planning to use
DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu (ARM
and Intel):

[Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
patches 1, 2, 3

Is the plan to introduce a new domain_get_nesting_info ops then?

Thanks

Eric


> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bf96172e8c1f71..8e6fee3ea454d3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2466,41 +2466,21 @@ static void arm_smmu_dma_enable_flush_queue(struct 
> iommu_domain *domain)
>   to_smmu_domain(domain)->non_strict = true;
>  }
>  
> -static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> - enum iommu_attr attr, void *data)
> +static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
>  {
> - int ret = 0;
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + int ret = -EPERM;
>  
> - mutex_lock(_domain->init_mutex);
> + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> + return -EINVAL;
>  
> - switch (domain->type) {
> - case IOMMU_DOMAIN_UNMANAGED:
> - switch (attr) {
> - case DOMAIN_ATTR_NESTING:
> - if (smmu_domain->smmu) {
> - ret = -EPERM;
> - goto out_unlock;
> - }
> -
> - if (*(int *)data)
> - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> - else
> - smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> - break;
> - default:
> - ret = -ENODEV;
> - }
> - break;
> - case IOMMU_DOMAIN_DMA:
> - ret = -ENODEV;
> - break;
> - default:
> - ret = -EINVAL;
> + mutex_lock(_domain->init_mutex);
> + if (!smmu_domain->smmu) {
> + smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> + ret = 0;
>   }
> -
> -out_unlock:
>   mutex_unlock(_domain->init_mutex);
> +
>   return ret;
>  }
>  
> @@ -2603,7 +2583,7 @@ static struct iommu_ops arm_smmu_ops = {
>   .device_group   = arm_smmu_device_group,
>   .dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
>   .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
> - .domain_set_attr= arm_smmu_domain_set_attr,
> + .domain_enable_nesting  = arm_smmu_domain_enable_nesting,
>   .of_xlate   = arm_smmu_of_xlate,
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index e7893e96f5177a..2e17d990d04481 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1497,6 +1497,24 @@ static void arm_smmu_dma_enable_flush_queue(struct 
> iommu_domain *domain)
>   to_smmu_domain(domain)->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
>  }
>  
> +static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + int ret = -EPERM;
> + 
> + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> + return -EINVAL;
> +
> + mutex_lock(_domain->init_mutex);
> + if (!smmu_domain->smmu) {
> + smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> + ret = 0;
> + }
> + mutex_unlock(_domain->init_mutex);
> +
> + return ret;
> +}
> +
>  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
>   enum iommu_attr attr, void *data)
>  {
> @@ -1508,17 +1526,6 @@ static int arm_smmu_domain_set_attr(struct 
> iommu_domain *domain,
>   switch(domain->type) {
>   case IOMMU_DOMAIN_UNMANAGED:
>   switch (attr) {
> - case DOMAIN_ATTR_NESTING:
> - if (smmu_domain->smmu) {
> - ret = -EPERM;
> - goto out_unlock;
> - }
> -
> - if (*(int *)data)
> - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> -   

Re: [PATCH v12 03/13] vfio: VFIO_IOMMU_SET_MSI_BINDING

2021-03-08 Thread Auger Eric
Hi Jean,

On 3/5/21 11:45 AM, Jean-Philippe Brucker wrote:
> Hi,
> 
> On Tue, Feb 23, 2021 at 10:06:15PM +0100, Eric Auger wrote:
>> This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim
>> to (un)register the guest MSI binding to the host. This latter
>> then can use those stage 1 bindings to build a nested stage
>> binding targeting the physical MSIs.
> 
> Now that RMR is in the IORT spec, could it be used for the nested MSI
> problem?  For virtio-iommu tables I was planning to do it like this:
> 
> MSI is mapped at stage-2 with an arbitrary IPA->doorbell PA. We report
> this IPA to userspace through iommu_groups/X/reserved_regions. No change
> there. Then to the guest we report a reserved identity mapping at IPA
> (using RMR, an equivalent DT binding, or probed RESV_MEM for
> virtio-iommu).

Is there any DT binding equivalent?

 The guest creates that mapping at stage-1, and that's it.
> Unless I overlooked something we'd only reuse existing infrastructure and
> avoid the SET_MSI_BINDING interface.

Yes at first glance I think this should work. The guest SMMU driver will
continue allocating IOVA for MSIs but I think that's not an issue as
they won't be used.

For the SMMU case this makes the guest behavior different from the
baremetal one though. Typically you will never get any S1 fault. Also
the S1 mapping is static and direct.

I will prototype this too.

Thanks

Eric
> 
> Thanks,
> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-25 Thread Auger Eric
Hi Shameer, all

On 2/23/21 9:56 PM, Eric Auger wrote:
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14
> (including the VFIO part in its last version: v12)

As committed, I have rebased the iommu + vfio part on top of Jean's
sva/current (5.11-rc4).

https://github.com/eauger/linux/tree/jean_sva_current_2stage_v14

I have not tested the SVA bits but I have tested there is no regression
from my pov.

>From the QEMU perspective is works off the shelf with that branch but if
you want to use other SVA related IOCTLs please remind of updating the
linux headers.

Again thank you to all of you who reviewed and tested the previous version.

Thanks

Eric
> 
> The VFIO series is sent separately.
> 
> History:
> 
> Previous version (v13):
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> 
> v13 -> v14:
> - Took into account all received comments I think. Great
>   thanks to all the testers for their effort and sometimes
>   fixes. I am really grateful to you!
> - numerous fixes including guest running in
>   noiommu, iommu.strict=0, iommu.passthrough=on,
>   enable_unsafe_noiommu_mode
> 
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>   reported by Shameer. This urged me to revisit patch 4 into
>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>   Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>   according to the last version
> 
> v11 -> v12:
> - rebase on top of v5.10-rc4
> 
> Eric Auger (13):
>   iommu: Introduce attach/detach_pasid_table API
>   iommu: Introduce bind/unbind_guest_msi
>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
> regions
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: report additional recoverable faults
>   iommu/smmuv3: Accept configs with more than one context descriptor
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 444 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +-
>  drivers/iommu/dma-iommu.c   | 142 ++-
>  drivers/iommu/iommu.c   | 106 +
>  include/linux/dma-iommu.h   |  16 +
>  include/linux/iommu.h   |  47 +++
>  include/uapi/linux/iommu.h  |  54 +++
>  7 files changed, 781 insertions(+), 42 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 04/13] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2021-02-23 Thread Auger Eric
Hi Shenming,

On 2/23/21 1:45 PM, Shenming Lu wrote:
>> +static int vfio_pci_dma_fault_init(struct vfio_pci_device *vdev)
>> +{
>> +struct vfio_region_dma_fault *header;
>> +struct iommu_domain *domain;
>> +size_t size;
>> +bool nested;
>> +int ret;
>> +
>> +domain = iommu_get_domain_for_dev(>pdev->dev);
>> +ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
>> +if (ret || !nested)
>> +return ret;
> 
> Hi Eric,
> 
> It seems that the type of nested should be int, the use of bool might trigger
> a panic in arm_smmu_domain_get_attr().

Thank you. That's fixed now.

Best Regards

Eric
> 
> Thanks,
> Shenming
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/22/21 1:20 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/22 18:53, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/2/21 1:34 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/16 19:00, Eric Auger wrote:
>>>> From: "Liu, Yi L" 
>>>>
>>>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>>>> which aims to pass the virtual iommu guest configuration
>>>> to the host. This latter takes the form of the so-called
>>>> PASID table.
>>>>
>>>> Signed-off-by: Jacob Pan 
>>>> Signed-off-by: Liu, Yi L 
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>> v11 -> v12:
>>>> - use iommu_uapi_set_pasid_table
>>>> - check SET and UNSET are not set simultaneously (Zenghui)
>>>>
>>>> v8 -> v9:
>>>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>>>
>>>> v6 -> v7:
>>>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>>>
>>>> v3 -> v4:
>>>> - restore ATTACH/DETACH
>>>> - add unwind on failure
>>>>
>>>> v2 -> v3:
>>>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>>>
>>>> v1 -> v2:
>>>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>>>> - remove the struct device arg
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>>>  include/uapi/linux/vfio.h   | 19 ++
>>>>  2 files changed, 84 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 67e827638995..87ddd9e882dc 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>>>> vfio_iommu *iommu,
>>>>return ret;
>>>>  }
>>>>  
>>>> +static void
>>>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>>>> +{
>>>> +  struct vfio_domain *d;
>>>> +
>>>> +  mutex_lock(>lock);
>>>> +  list_for_each_entry(d, >domain_list, next)
>>>> +  iommu_detach_pasid_table(d->domain);
>>>> +
>>>> +  mutex_unlock(>lock);
>>>> +}
>>>> +
>>>> +static int
>>>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>>>> +{
>>>> +  struct vfio_domain *d;
>>>> +  int ret = 0;
>>>> +
>>>> +  mutex_lock(>lock);
>>>> +
>>>> +  list_for_each_entry(d, >domain_list, next) {
>>>> +  ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>>>> *)arg);
>>> This design is not very clear to me. This assumes all iommu_domains share 
>>> the same pasid table.
>>>
>>> As I understand, it's reasonable when there is only one group in the 
>>> domain, and only one domain in the vfio_iommu.
>>> If more than one group in the vfio_iommu, the guest may put them into 
>>> different guest iommu_domain, then they have different pasid table.
>>>
>>> Is this the use scenario?
>>
>> the vfio_iommu is attached to a container. all the groups within a
>> container share the same set of page tables (linux
>> Documentation/driver-api/vfio.rst). So to me if you want to use
>> different pasid tables, the groups need to be attached to different
>> containers. Does that make sense to you?
> OK, so this is what I understand about the design. A little question is that 
> when
> we perform attach_pasid_table on a container, maybe we ought to do a sanity
> check to make sure that only one group is in this container, instead of
> iterating all domain?
> 
> To be frank, my main concern is that if we put each group into different 
> container
> under nested mode, then we give up the possibility that they can share stage2 
> page tables,
> which saves host memory and reduces the time of preparing environment for VM.

Referring to the QEMU integration, when you use a virtual IOMMU, there
is generally one VFIO container per viommu protected device
(AddressSpace), independently on the fact nested stage is being used. I
think the exception is if you put 2 assigned devices behind a virtual
PCIe to PCI bridge (pcie-pci-bridge), in that case they hav

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/2/21 1:34 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/16 19:00, Eric Auger wrote:
>> From: "Liu, Yi L" 
>>
>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>> which aims to pass the virtual iommu guest configuration
>> to the host. This latter takes the form of the so-called
>> PASID table.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v11 -> v12:
>> - use iommu_uapi_set_pasid_table
>> - check SET and UNSET are not set simultaneously (Zenghui)
>>
>> v8 -> v9:
>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>
>> v6 -> v7:
>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>
>> v3 -> v4:
>> - restore ATTACH/DETACH
>> - add unwind on failure
>>
>> v2 -> v3:
>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>
>> v1 -> v2:
>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>> - remove the struct device arg
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>  include/uapi/linux/vfio.h   | 19 ++
>>  2 files changed, 84 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 67e827638995..87ddd9e882dc 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>> vfio_iommu *iommu,
>>  return ret;
>>  }
>>  
>> +static void
>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>> +{
>> +struct vfio_domain *d;
>> +
>> +mutex_lock(>lock);
>> +list_for_each_entry(d, >domain_list, next)
>> +iommu_detach_pasid_table(d->domain);
>> +
>> +mutex_unlock(>lock);
>> +}
>> +
>> +static int
>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>> +{
>> +struct vfio_domain *d;
>> +int ret = 0;
>> +
>> +mutex_lock(>lock);
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>> *)arg);
> This design is not very clear to me. This assumes all iommu_domains share the 
> same pasid table.
> 
> As I understand, it's reasonable when there is only one group in the domain, 
> and only one domain in the vfio_iommu.
> If more than one group in the vfio_iommu, the guest may put them into 
> different guest iommu_domain, then they have different pasid table.
> 
> Is this the use scenario?

the vfio_iommu is attached to a container. all the groups within a
container share the same set of page tables (linux
Documentation/driver-api/vfio.rst). So to me if you want to use
different pasid tables, the groups need to be attached to different
containers. Does that make sense to you?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
>> +if (ret)
>> +goto unwind;
>> +}
>> +goto unlock;
>> +unwind:
>> +list_for_each_entry_continue_reverse(d, >domain_list, next) {
>> +iommu_detach_pasid_table(d->domain);
>> +}
>> +unlock:
>> +mutex_unlock(>lock);
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>> struct vfio_info_cap *caps)
>>  {
>> @@ -2747,6 +2782,34 @@ static int vfio_iommu_type1_unmap_dma(struct 
>> vfio_iommu *iommu,
>>  -EFAULT : 0;
>>  }
>>  
>> +static int vfio_iommu_type1_set_pasid_table(struct vfio_iommu *iommu,
>> +unsigned long arg)
>> +{
>> +struct vfio_iommu_type1_set_pasid_table spt;
>> +unsigned long minsz;
>> +int ret = -EINVAL;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, flags);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (spt.argsz < minsz)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET &&
>> +spt.flags & VFIO_PASID_TABLE_FLAG_UNSET)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET)
>> +ret = vfio_attach_pasid_table(iommu, arg + minsz);
>> +else if (spt.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
>> +vfio_detach_pasid_table(iommu);
>> +ret = 0;
>> +}
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  unsigned long arg)
>>  {
>> @@ -2867,6 +2930,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  return vfio_iommu_type1_unmap_dma(iommu, arg);
>>  case VFIO_IOMMU_DIRTY_PAGES:
>>  return vfio_iommu_type1_dirty_pages(iommu, arg);
>> +case VFIO_IOMMU_SET_PASID_TABLE:
>> +return vfio_iommu_type1_set_pasid_table(iommu, arg);
>>  default:
>>  return -ENOTTY;
>>  }
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 2f313a238a8f..78ce3ce6c331 100644
>> 

Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-21 Thread Auger Eric
Hi Shameer,
On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: fast devsel
> Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool : n=155456, size=2176, 
> socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port 
> will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: bus master, fast devsel, latency 0
> Memory at 800010 (64-bit, prefetchable) [size=64K]
> Memory at 80 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe 

Re: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 11:36 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>>> -Original Message-
>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>> Sent: 16 November 2020 11:00
>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>>> alex.william...@redhat.com
>>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>>> Thodi ;
>>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>>> nicoleots...@gmail.com; yuzenghui 
>>> Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
>>> region
>>>
>>> In preparation for vSVA, let's register a DMA fault response region,
>>> where the userspace will push the page responses and increment the
>>> head of the buffer. The kernel will pop those responses and inject
>>> them on iommu side.
>>>
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  drivers/vfio/pci/vfio_pci.c | 114 +---
>>>  drivers/vfio/pci/vfio_pci_private.h |   5 ++
>>>  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
>>>  include/uapi/linux/vfio.h   |  32 
>>>  4 files changed, 181 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index 65a83fd0e8c0..e9a904ce3f0d 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
>>> vfio_pci_device *vdev,
>>> kfree(vdev->fault_pages);
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> -  struct vfio_pci_region *region,
>>> -  struct vm_area_struct *vma)
>>> +static void
>>> +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region) {
>>> +   if (vdev->dma_fault_response_wq)
>>> +   destroy_workqueue(vdev->dma_fault_response_wq);
>>> +   kfree(vdev->fault_response_pages);
>>> +   vdev->fault_response_pages = NULL;
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +struct vfio_pci_region *region,
>>> +struct vm_area_struct *vma,
>>> +u8 *pages)
>>>  {
>>> u64 phys_len, req_len, pgoff, req_start;
>>> unsigned long long addr;
>>> @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>>> req_start = pgoff << PAGE_SHIFT;
>>>
>>> -   /* only the second page of the producer fault region is mmappable */
>>> +   /* only the second page of the fault region is mmappable */
>>> if (req_start < PAGE_SIZE)
>>> return -EINVAL;
>>>
>>> if (req_start + req_len > phys_len)
>>> return -EINVAL;
>>>
>>> -   addr = virt_to_phys(vdev->fault_pages);
>>> +   addr = virt_to_phys(pages);
>>> vma->vm_private_data = vdev;
>>> vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
>>>
>>> @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> return ret;
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
>>> -struct vfio_pci_region *region,
>>> -struct vfio_info_cap *caps)
>>> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_pages);
>>> +}
>>> +
>>> +static int
>>> +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region,
>>> +   struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_response_pages);
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
>>> *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vfio_info_cap *caps,
>>> +  u32 cap_id)
>>>  {
>>> struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
>>> struct vfio_region_info_cap_fault cap = {
>>> -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>>> +   .header.id = cap_id,
>>> .header.version = 1,
>>> .version = 1,
>>> };
>>> @@ -383,6 +410,14 @@ static int
>>> vfio_pci_dma_fault_add_capability(struct
>>> 

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Auger Eric
Hi Keqian,

On 2/18/21 9:43 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/12 16:55, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/18 19:21, Eric Auger wrote:
>>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>>> for each MSI doorbell. If both the host and the guest are exposed
>>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>>> onto the physical doorbell (hDB).
>>>>
>>>> So we end up with 2 untied mappings:
>>>>  S1S2
>>>> gIOVA->gDB
>>>>   hIOVA->hDB
>>>>
>>>> Currently the PCI device is programmed by the host with hIOVA
>>>> as MSI doorbell. So this does not work.
>>>>
>>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>>> that gIOVA can be reused by the host instead of re-allocating
>>>> a new IOVA. So the goal is to create the following nested mapping:
>>> Does the gDB can be reused under non-nested mode?
>>
>> Under non nested mode the hIOVA is allocated within the MSI reserved
>> region exposed by the SMMU driver, [0x800, 80f]. see
>> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
>> is programmed in the physical device so that the physical SMMU
>> translates it into the physical doorbell (hDB = host physical ITS
> So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
> non-virtualization scenario.
Without virtualization, the host kernel also transparently allocates an
iova to map the doorbell. With standard passthrough withou vIOMMU, the
iova window is different (MSI RESV region).

Thanks

Eric
> 
>> doorbell). The gDB is not used at pIOMMU programming level. It is only
>> used when setting up the KVM irq route.
>>
>> Hope this answers your question.
> Thanks for your explanation!
>>
> 
> Thanks,
> Keqian
> 
>>>
>>>>
>>>>  S1S2
>>>> gIOVA->gDB ->hDB
>>>>
>>>> and program the PCI device with gIOVA MSI doorbell.
>>>>
>>>> In case we have several devices attached to this nested domain
>>>> (devices belonging to the same group), they cannot be isolated
>>>> on guest side either. So they should also end up in the same domain
>>>> on guest side. We will enforce that all the devices attached to
>>>> the host iommu domain use the same physical doorbell and similarly
>>>> a single virtual doorbell mapping gets registered (1 single
>>>> virtual doorbell is used on guest as well).
>>>>
>>> [...]
>>>
>>>> + *
>>>> + * The associated IOVA can be reused by the host to create a nested
>>>> + * stage2 binding mapping translating into the physical doorbell used
>>>> + * by the devices attached to the domain.
>>>> + *
>>>> + * All devices within the domain must share the same physical doorbell.
>>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>>> + */
>>>> +
>>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>>> +   dma_addr_t giova, phys_addr_t gpa, size_t size)
>>>> +{
>>>> +  if (unlikely(!domain->ops->bind_guest_msi))
>>>> +  return -ENODEV;
>>>> +
>>>> +  return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>>> +
>>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>>> +  dma_addr_t iova)
>>> nit: s/iova/giova
>> sure
>>>
>>>> +{
>>>> +  if (unlikely(!domain->ops->unbind_guest_msi))
>>>> +  return;
>>>> +
>>>> +  domain->ops->unbind_guest_msi(domain, iova);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>>> +
>>> [...]
>>>
>>> Thanks,
>>> Keqian
>>>
>>
>> Thanks
>>
>> Eric
>>
>> .
>>
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-02-15 Thread Auger Eric
Hi Shameer,

On 12/3/20 7:42 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: kvmarm-boun...@lists.cs.columbia.edu
>> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Auger Eric
>> Sent: 01 December 2020 13:59
>> To: wangxingang 
>> Cc: Xieyingtai ; jean-phili...@linaro.org;
>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> vivek.gau...@arm.com; alex.william...@redhat.com;
>> zhangfei@linaro.org; robin.mur...@arm.com;
>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Xingang,
>>
>> On 12/1/20 2:33 PM, Xingang Wang wrote:
>>> Hi Eric
>>>
>>> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>>>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>>> * insertion to guarantee those are observed before the TLBI. Do be
>>>> * careful, 007.
>>>> */
>>>> -  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +  if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +  cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
>>>> +  cmd.tlbi.asid   = ext_asid;
>>>> +  cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>>> +  } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>
>>> Found a problem here, the cmd for guest stage 1 invalidation is built,
>>> but it is not delivered to smmu.
>>>
>>
>> Thank you for the report. I will fix that soon. With that fixed, have
>> you been able to run vSVA on top of the series. Do you need other stuff
>> to be fixed at SMMU level? 
> 
> I am seeing another issue with this series. This is when you have the vSMMU
> in non-strict mode(iommu.strict=0). Any network pass-through dev with iperf 
> run 
> will be enough to reproduce the issue. It may randomly stop/hang.
> 
> It looks like the .flush_iotlb_all from guest is not propagated down to the 
> host
> correctly. I have a temp hack to fix this in Qemu wherein CMDQ_OP_TLBI_NH_ASID
> will result in a CACHE_INVALIDATE with IOMMU_INV_GRANU_PASID flag and archid
> set.

Thank you for the analysis. Indeed the NH_ASID was not properly handled
as asid info was not passed down. I fixed domain invalidation and added
asid based invalidation.

Thanks

Eric
> 
> Please take a look and let me know. 
> 
> As I am going to respin soon, please let me
>> know what is the best branch to rebase to alleviate your integration.
> 
> Please find the latest kernel and Qemu branch with vSVA support added here,
> 
> https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva
> https://github.com/hisilicon/qemu/tree/v5.2.0-rc1-2stage-rfcv7-vsva
> 
> I have done some basic minimum vSVA tests on a HiSilicon D06 board with
> a zip dev that supports STALL. All looks good so far apart from the issues
> that have been already reported/discussed.
> 
> The kernel branch is actually a rebase of sva/uacce related patches from a
> Linaro branch here,
> 
> https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.10
> 
> I think going forward it will be good(if possible) to respin your series on 
> top of
> a sva branch with STALL/PRI support added. 
> 
> Hi Jean/zhangfei,
> Is it possible to have a branch with minimum required SVA/UACCE related 
> patches
> that are already public and can be a "stable" candidate for future respin of 
> Eric's series?
> Please share your thoughts.
> 
> Thanks,
> Shameer 
> 
>> Best Regards
>>
>> Eric
>>
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
IOMMU_NESTING_FEAT_PAGE_RESP definition is missing too

Eric
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(>padding, 0x0, 12);
> + memset(>vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
>   return 0;
>  }
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.

s/reuqired/required in the commit title
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
I understood IOMMU_NESTING_FEAT_BIND_PGTBL advertises the requirement to
bind tables per PASID, ie. passing iommu_gpasid_bind_data.
In ARM case I guess you plan to use attach/detach_pasid_table API with
iommu_pasid_table_config struct. So I understood we should add a new
feature here.
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(>padding, 0x0, 12);
> + memset(>vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
spurious new line
>   return 0;
>  }
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Report domain nesting info for arm-smmu-v3

2021-02-12 Thread Auger Eric
Hi Vivek,
On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Add a vendor specific structure for domain nesting info for
> arm smmu-v3, and necessary info fields required to populate
> stage1 page tables.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  include/uapi/linux/iommu.h | 31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 4d3d988fa353..5f059bcf7720 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -323,7 +323,8 @@ struct iommu_gpasid_bind_data {
>  #define IOMMU_GPASID_BIND_VERSION_1  1
>   __u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD 1
> -#define IOMMU_PASID_FORMAT_LAST  2
> +#define IOMMU_PASID_FORMAT_ARM_SMMU_V3   2
> +#define IOMMU_PASID_FORMAT_LAST  3
>   __u32 format;
>   __u32 addr_width;
>  #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */
> @@ -409,6 +410,21 @@ struct iommu_nesting_info_vtd {
>   __u64   ecap_reg;
>  };
>  
> +/*
> + * struct iommu_nesting_info_arm_smmuv3 - Arm SMMU-v3 nesting info.
> + */
> +struct iommu_nesting_info_arm_smmuv3 {
> + __u32   flags;
> + __u16   asid_bits;
> +
> + /* Arm LPAE page table format as per kernel */
> +#define ARM_PGTBL_32_LPAE_S1 (0x0)
> +#define ARM_PGTBL_64_LPAE_S1 (0x2)
Shouldn't it be a bitfield instead as both can be supported (the actual
driver only supports 64b table format though). Does it match matches
IDR0.TTF?
> + __u8pgtbl_fmt;
So I understand this API is supposed to allow VFIO to expose those info
early enough to the userspace to help configuring the viommu and avoid
errors later on. I wonder how far we want to go on this path. What about
those other caps that impact the STE/CD validity. There may be others...

SMMU_IDR0.CD2L (support of 2 stage CD)
SMMU_IDR0.TTENDIAN (endianness)
SMMU_IDR0.HTTU (if 0 forbids HA/HD setting in the CD)
SMMU_IDR3.STT (impacts T0SZ)

Thanks

Eric

> +
> + __u8padding[9];
> +};
> +
>  /*
>   * struct iommu_nesting_info - Information for nesting-capable IOMMU.
>   *  userspace should check it before using
> @@ -445,11 +461,13 @@ struct iommu_nesting_info_vtd {
>   * +---+--+
>   *
>   * data struct types defined for @format:
> - * ++=+
> - * | @format| data struct |
> - * ++=+
> - * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd   |
> - * ++-+
> + * ++==+
> + * | @format| data struct  |
> + * ++==+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd|
> + * +---+---+
> + * | IOMMU_PASID_FORMAT_ARM_SMMU_V3 | struct iommu_nesting_info_arm_smmuv3 |
> + * ++--+
>   *
>   */
>  struct iommu_nesting_info {
> @@ -466,6 +484,7 @@ struct iommu_nesting_info {
>   /* Vendor specific data */
>   union {
>   struct iommu_nesting_info_vtd vtd;
> + struct iommu_nesting_info_arm_smmuv3 smmuv3;
>   } vendor;
>  };
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 02/16] iommu/smmu: Report empty domain nesting info

2021-02-12 Thread Auger Eric
Hi Vivek, Yi,

On 2/12/21 8:14 AM, Vivek Gautam wrote:
> Hi Yi,
> 
> 
> On Sat, Jan 23, 2021 at 2:29 PM Liu, Yi L  wrote:
>>
>> Hi Eric,
>>
>>> From: Auger Eric 
>>> Sent: Tuesday, January 19, 2021 6:03 PM
>>>
>>> Hi Yi, Vivek,
>>>
>> [...]
>>>> I see. I think there needs a change in the code there. Should also expect
>>>> a nesting_info returned instead of an int anymore. @Eric, how about your
>>>> opinion?
>>>>
>>>> domain = iommu_get_domain_for_dev(>pdev->dev);
>>>> ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING,
>>> );
>>>> if (ret || !(info.features & IOMMU_NESTING_FEAT_PAGE_RESP)) {
>>>> /*
>>>>  * No need go futher as no page request service support.
>>>>  */
>>>> return 0;
>>>> }
>>> Sure I think it is "just" a matter of synchro between the 2 series. Yi,
>>
>> exactly.
>>
>>> do you have plans to respin part of
>>> [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
>>> or would you allow me to embed this patch in my series.
>>
>> My v7 hasn’t touch the prq change yet. So I think it's better for you to
>> embed it to  your series. ^_^>>
> 
> Can you please let me know if you have an updated series of these
> patches? It will help me to work with virtio-iommu/arm side changes.

As per the previous discussion, I plan to take those 2 patches in my
SMMUv3 nested stage series:

[PATCH v7 01/16] iommu: Report domain nesting info
[PATCH v7 02/16] iommu/smmu: Report empty domain nesting info

we need to upgrade both since we do not want to report an empty nesting
info anymore, for arm.

Thanks

Eric
> 
> Thanks & regards
> Vivek
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-12 Thread Auger Eric
Hi Keqian,

On 2/1/21 12:52 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>> for each MSI doorbell. If both the host and the guest are exposed
>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>> onto the physical doorbell (hDB).
>>
>> So we end up with 2 untied mappings:
>>  S1S2
>> gIOVA->gDB
>>   hIOVA->hDB
>>
>> Currently the PCI device is programmed by the host with hIOVA
>> as MSI doorbell. So this does not work.
>>
>> This patch introduces an API to pass gIOVA/gDB to the host so
>> that gIOVA can be reused by the host instead of re-allocating
>> a new IOVA. So the goal is to create the following nested mapping:
> Does the gDB can be reused under non-nested mode?

Under non nested mode the hIOVA is allocated within the MSI reserved
region exposed by the SMMU driver, [0x800, 80f]. see
iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
is programmed in the physical device so that the physical SMMU
translates it into the physical doorbell (hDB = host physical ITS
doorbell). The gDB is not used at pIOMMU programming level. It is only
used when setting up the KVM irq route.

Hope this answers your question.

> 
>>
>>  S1S2
>> gIOVA->gDB ->hDB
>>
>> and program the PCI device with gIOVA MSI doorbell.
>>
>> In case we have several devices attached to this nested domain
>> (devices belonging to the same group), they cannot be isolated
>> on guest side either. So they should also end up in the same domain
>> on guest side. We will enforce that all the devices attached to
>> the host iommu domain use the same physical doorbell and similarly
>> a single virtual doorbell mapping gets registered (1 single
>> virtual doorbell is used on guest as well).
>>
> [...]
> 
>> + *
>> + * The associated IOVA can be reused by the host to create a nested
>> + * stage2 binding mapping translating into the physical doorbell used
>> + * by the devices attached to the domain.
>> + *
>> + * All devices within the domain must share the same physical doorbell.
>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>> + */
>> +
>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>> + dma_addr_t giova, phys_addr_t gpa, size_t size)
>> +{
>> +if (unlikely(!domain->ops->bind_guest_msi))
>> +return -ENODEV;
>> +
>> +return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>> +
>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>> +dma_addr_t iova)
> nit: s/iova/giova
sure
> 
>> +{
>> +if (unlikely(!domain->ops->unbind_guest_msi))
>> +return;
>> +
>> +domain->ops->unbind_guest_msi(domain, iova);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>> +
> [...]
> 
> Thanks,
> Keqian
> 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2021-02-11 Thread Auger Eric
Hi Keqian,

On 2/2/21 8:14 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * three cases at the moment:
>>   *
>>   * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
> 
> The following line "BUG_ON(ste_live && !nested);" forbids this transform.

Yes as pointed out by Kunkun, there is always an abort in-between. I
will restore the original comment.

> And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
> So after detach, the status is not the same as that before attach. Does it
> match our expectation?

Indeed at detach time I think I should reset the abort() flag as this
latter is not imposed anymore by the guest.

Thanks!

Eric


> 
>>   *
>>   * Given that we can't update the STE atomically and the SMMU
>>   * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * 3. Update Config, sync
>>   */
>>  u64 val = le64_to_cpu(dst[0]);
>> -bool ste_live = false;
>> +bool s1_live = false, s2_live = false, ste_live;
>> +bool abort, nested = false, translate = false;
>>  struct arm_smmu_device *smmu = NULL;
>>  struct arm_smmu_s1_cfg *s1_cfg;
>>  struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  default:
>>  break;
>>  }
>> +nested = s1_cfg->set && s2_cfg->set;
>> +translate = s1_cfg->set || s2_cfg->set;
>>  }
>>  
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  case STRTAB_STE_0_CFG_BYPASS:
>>  break;
>>  case STRTAB_STE_0_CFG_S1_TRANS:
>> +s1_live = true;
>> +break;
>>  case STRTAB_STE_0_CFG_S2_TRANS:
>> -ste_live = true;
>> +s2_live = true;
>> +break;
>> +case STRTAB_STE_0_CFG_NESTED:
>> +s1_live = true;
>> +s2_live = true;
>>  break;
>>  case STRTAB_STE_0_CFG_ABORT:
>> -BUG_ON(!disable_bypass);
>>  break;
>>  default:
>>  BUG(); /* STE corruption */
>>  }
>>  }
>>  
>> +ste_live = s1_live || s2_live;
>> +
>>  /* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  val = STRTAB_STE_0_V;
>>  
>>  /* Bypass/fault */
>> -if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -if (!smmu_domain && disable_bypass)
>> +
>> +if (!smmu_domain)
>> +abort = disable_bypass;
>> +else
>> +abort = smmu_domain->abort;
>> +
>> +if (abort || !translate) {
>> +if (abort)
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> STRTAB_STE_0_CFG_ABORT);
>>  else
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> 

Re: [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table

2021-02-11 Thread Auger Eric
Hi Keqian,

On 2/2/21 9:03 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v7 -> v8:
>> - remove smmu->features check, now done on domain finalize
>>
>> v6 -> v7:
>> - check versions and comment the fact we don't need to take
>>   into account s1dss and s1fmt
>> v3 -> v4:
>> - adapt to changes in iommu_pasid_table_config
>> - different programming convention at s1_cfg/s2_cfg/ste.abort
>>
>> v2 -> v3:
>> - callback now is named set_pasid_table and struct fields
>>   are laid out differently.
>>
>> v1 -> v2:
>> - invalidate the STE before changing them
>> - hold init_mutex
>> - handle new fields
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +
>>  1 file changed, 89 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 412ea1bafa50..805acdc18a3a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device 
>> *dev,
>>  iommu_dma_get_resv_regions(dev, head);
>>  }
>>  
>> +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
>> +   struct iommu_pasid_table_config *cfg)
>> +{
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_master *master;
>> +struct arm_smmu_device *smmu;
>> +unsigned long flags;
>> +int ret = -EINVAL;
>> +
>> +if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
>> +return -EINVAL;
>> +
>> +if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
>> +cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
>> +return -EINVAL;
>> +
>> +mutex_lock(_domain->init_mutex);
>> +
>> +smmu = smmu_domain->smmu;
>> +
>> +if (!smmu)
>> +goto out;
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +goto out;
>> +
>> +switch (cfg->config) {
>> +case IOMMU_PASID_CONFIG_ABORT:
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = true;
>> +break;
>> +case IOMMU_PASID_CONFIG_BYPASS:
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = false;
> I didn't test it, but it seems that this will cause BUG() in 
> arm_smmu_write_strtab_ent().
> At the line "BUG_ON(ste_live && !nested);". Maybe I miss something?

No you are fully correct. Shammeer hit the BUG_ON() when booting the
guest with iommu.passthrough = 1. So I removed the BUG_ON(). The legacy
BUG_ON(ste_live) still is there under the form of BUG_ON(s1_live).

Thanks!

Eric


> 
>> +break;
>> +case IOMMU_PASID_CONFIG_TRANSLATE:
>> +/* we do not support S1 <-> S1 transitions */
>> +if (smmu_domain->s1_cfg.set)
>> +goto out;
>> +
>> +/*
>> + * we currently support a single CD so s1fmt and s1dss
>> + * fields are also ignored
>> + */
>> +if (cfg->pasid_bits)
>> +goto out;
>> +
>> +smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +smmu_domain->s1_cfg.set = true;
>> +smmu_domain->abort = false;
>> +break;
>> +default:
>> +goto out;
>> +}
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +ret = 0;
>> +out:
>> +mutex_unlock(_domain->init_mutex);
>> +return ret;
>> +}
>> +
> [...]
> 
> Thanks,
> Keqian
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 1:26 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> From: Jean-Philippe Brucker 
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>
>> Signed-off-by: Jean-Philippe Brucker 
> [...]
> 
>>  }
>>  
>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>> +  struct arm_smmu_master *master)
>> +{
>> +int i;
>> +int ret = 0;
>> +struct arm_smmu_stream *new_stream, *cur_stream;
>> +struct rb_node **new_node, *parent_node = NULL;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +master->streams = kcalloc(fwspec->num_ids,
>> +  sizeof(struct arm_smmu_stream), GFP_KERNEL);
>> +if (!master->streams)
>> +return -ENOMEM;
>> +master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.
> 
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
>> +u32 sid = fwspec->ids[i];
>> +
>> +new_stream = >streams[i];
>> +new_stream->id = sid;
>> +new_stream->master = master;
>> +
>> +/*
>> + * Check the SIDs are in range of the SMMU and our stream table
>> + */
>> +if (!arm_smmu_sid_in_range(smmu, sid)) {
>> +ret = -ERANGE;
>> +break;
>> +}
>> +
>> +/* Ensure l2 strtab is initialised */
>> +if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> +ret = arm_smmu_init_l2_strtab(smmu, sid);
>> +if (ret)
>> +break;
>> +}
>> +
>> +/* Insert into SID tree */
>> +new_node = &(smmu->streams.rb_node);
>> +while (*new_node) {
>> +cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
>> +  node);
>> +parent_node = *new_node;
>> +if (cur_stream->id > new_stream->id) {
>> +new_node = &((*new_node)->rb_left);
>> +} else if (cur_stream->id < new_stream->id) {
>> +new_node = &((*new_node)->rb_right);
>> +} else {
>> +dev_warn(master->dev,
>> + "stream %u already in tree\n",
>> + cur_stream->id);
>> +ret = -EINVAL;
>> +break;
>> +}
>> +}
>> +
>> +if (!ret) {
>> +rb_link_node(_stream->node, parent_node, new_node);
>> +rb_insert_color(_stream->node, >streams);
>> +}
>> +}
>> +
>> +if (ret) {
>> +for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.
> 
>> +rb_erase(>streams[i].node, >streams);
>> +kfree(master->streams);
>> +}
>> +mutex_unlock(>streams_mutex);
>> +
>> +return ret;
>> +}
>> +
>> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
>> +{
>> +int i;
>> +struct arm_smmu_device *smmu = master->smmu;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +if (!smmu || !master->streams)
>> +return;
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids; i++)
>> +rb_erase(>streams[i].node, >streams);
>> +mutex_unlock(>streams_mutex);
>> +
>> +kfree(master->streams);
>> +}
>> +
>>  static struct iommu_ops arm_smmu_ops;
>>  
>>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  {
>> -int i, ret;
>> +int ret;
>>  struct arm_smmu_device *smmu;
>>  struct arm_smmu_master *master;
>>  struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> @@ -2331,27 +2447,12 @@ static struct iommu_device 
>> *arm_smmu_probe_device(struct device *dev)
>>  
>>  master->dev = dev;
>>  master->smmu = smmu;
>> -master->sids = fwspec->ids;
>> -master->num_sids = fwspec->num_ids;
>>  INIT_LIST_HEAD(>bonds);
>>  dev_iommu_priv_set(dev, master);
>>  
>> -/* Check the SIDs are in range of the SMMU and our stream table */
>> -for (i = 0; i < master->num_sids; i++) {
>> -u32 sid = master->sids[i];
>> -
>> -if (!arm_smmu_sid_in_range(smmu, sid)) {
>> -ret = -ERANGE;
>> -goto err_free_master;
>> -}
>> -
>> -/* Ensure l2 strtab is initialised */
>> -

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 12:27 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c  | 68 ++
>>  include/linux/iommu.h  | 21 
>>  include/uapi/linux/iommu.h | 54 ++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain 
>> *domain, struct device *dev
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
> miss export symbol?
yes we do
> 
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +  void __user *uinfo)
>> +{
>> +struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +u32 minsz;
>> +
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +/*
>> + * No new spaces can be added before the variable sized union, the
>> + * minimum size is the offset to the union.
>> + */
>> +minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +/* Copy minsz from user to get flags and argsz */
>> +if (copy_from_user(_table_data, uinfo, minsz))
>> +return -EFAULT;
>> +
>> +/* Fields before the variable size union are mandatory */
>> +if (pasid_table_data.argsz < minsz)
>> +return -EINVAL;
>> +
>> +/* PASID and address granu require additional info beyond minsz */
>> +if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +return -EINVAL;
>> +if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +pasid_table_data.argsz <
>> +offsetofend(struct iommu_pasid_table_config, 
>> vendor_data.smmuv3))
>> +return -EINVAL;
>> +
>> +/*
>> + * User might be using a newer UAPI header which has a larger data
>> + * size, we shall support the existing flags within the current
>> + * size. Copy the remaining user data _after_ minsz but not more
>> + * than the current kernel supported size.
>> + */
>> +if (copy_from_user((void *)_table_data + minsz, uinfo + minsz,
>> +   min_t(u32, pasid_table_data.argsz, 
>> sizeof(pasid_table_data)) - minsz))
>> +return -EFAULT;
>> +
>> +/* Now the argsz is validated, check the content */
>> +if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +return -EINVAL;
>> +
>> +return domain->ops->attach_pasid_table(domain, _table_data);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * 

Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Auger Eric
Hi Jean,

On 2/1/21 12:12 PM, Jean-Philippe Brucker wrote:
> On Sun, Jan 31, 2021 at 07:29:09PM +0100, Auger Eric wrote:
>> Hi Jean,
>>
>> Some rather minor comments§questions below that may not justify a respin.
>>
>> On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
>>> -static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
>>> +bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master)
>>>  {
>>> -   return false;
>>> +   /* We're not keeping track of SIDs in fault events */
>> shall we? [*] below
> 
> That would require storing the incoming SID into the iommu_fault_event
> struct, and retrieve it in arm_smmu_page_response(). Easy enough, but I
> don't think it's needed for existing devices.
OK
> 
>>> +   if (master->num_streams != 1)
>>> +   return false;
> [...]
>>> +static int arm_smmu_page_response(struct device *dev,
>>> + struct iommu_fault_event *unused,
>>> + struct iommu_page_response *resp)
>>> +{
>>> +   struct arm_smmu_cmdq_ent cmd = {0};
>>> +   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>>> +   int sid = master->streams[0].id;
>> [*]
>>> +
>>> +   if (master->stall_enabled) {
>>> +   cmd.opcode  = CMDQ_OP_RESUME;
>>> +   cmd.resume.sid  = sid;
>>> +   cmd.resume.stag = resp->grpid;
>>> +   switch (resp->code) {
>>> +   case IOMMU_PAGE_RESP_INVALID:
>> add fallthrough?
> 
> I think fallthrough is mainly useful to tell reader and compiler that a
> break was omitted on purpose. When two cases are stuck together the intent
> to merge the flow is clear enough in my opinion. GCC's
> -Wimplicit-fallthrough doesn't warn in this case.
OK
> 
>>> +   case IOMMU_PAGE_RESP_FAILURE:
>>> +   cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT;
>>> +   break;
> [...]
>>> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>>> +{
>>> +   int ret;
>>> +   u32 reason;
>>> +   u32 perm = 0;
>>> +   struct arm_smmu_master *master;
>>> +   bool ssid_valid = evt[0] & EVTQ_0_SSV;
>>> +   u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
>>> +   struct iommu_fault_event fault_evt = { };
>>> +   struct iommu_fault *flt = _evt.fault;
>>> +
>>> +   /* Stage-2 is always pinned at the moment */
>>> +   if (evt[1] & EVTQ_1_S2)
>>> +   return -EFAULT;
>>> +
>>> +   master = arm_smmu_find_master(smmu, sid);
>>> +   if (!master)
>>> +   return -EINVAL;
>>> +
>>> +   if (evt[1] & EVTQ_1_RnW)
>>> +   perm |= IOMMU_FAULT_PERM_READ;
>>> +   else
>>> +   perm |= IOMMU_FAULT_PERM_WRITE;
>>> +
>>> +   if (evt[1] & EVTQ_1_InD)
>>> +   perm |= IOMMU_FAULT_PERM_EXEC;
>>> +
>>> +   if (evt[1] & EVTQ_1_PnU)
>>> +   perm |= IOMMU_FAULT_PERM_PRIV;
>>> +
>>> +   switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
>>> +   case EVT_ID_TRANSLATION_FAULT:
>>> +   case EVT_ID_ADDR_SIZE_FAULT:
>>> +   case EVT_ID_ACCESS_FAULT:
>>> +   reason = IOMMU_FAULT_REASON_PTE_FETCH;
>> Doesn't it rather map to IOMMU_FAULT_REASON_ACCESS?
>> /* access flag check failed */"
> 
> Good point, I guess it didn't exist when I wrote this. And ADDR_SIZE_FAULT
> corresponds to IOMMU_FAULT_REASON_OOR_ADDRESS now, right?
yes it dies
> 
> By the way the wording on those two fault reasons, "access flag" and
> "stage", seems arch-specific - x86 names are "accessed flag" and "level".
> 
>>> +   break;
>>> +   case EVT_ID_PERMISSION_FAULT:
>>> +   reason = IOMMU_FAULT_REASON_PERMISSION;
>>> +   break;
>>> +   default:
>>> +   return -EOPNOTSUPP;
>>> +   }
>>> +
>>> +   if (evt[1] & EVTQ_1_STALL) {
>>> +   flt->type = IOMMU_FAULT_PAGE_REQ;
>>> +   flt->prm = (struct iommu_fault_page_request) {
>>> +   .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
>>> +   .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
>>> +   .perm = perm,
>>> +   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
>>> +   };
>>> +
>>> +   if (ssid_v

Re: [PATCH v12 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

2021-01-31 Thread Auger Eric
Hi Jean,

On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> Some devices manage I/O Page Faults (IOPF) themselves instead of relying
> on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without
> mandating IOMMU-managed IOPF. The other device drivers now need to first
> enable IOMMU_DEV_FEAT_IOPF before enabling IOMMU_DEV_FEAT_SVA. Enabling
> IOMMU_DEV_FEAT_IOPF on its own doesn't have any effect visible to the
> device driver, it is used in combination with other features.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric

> ---
> Cc: Arnd Bergmann 
> Cc: David Woodhouse 
> Cc: Greg Kroah-Hartman 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Cc: Will Deacon 
> Cc: Zhangfei Gao 
> Cc: Zhou Wang 
> ---
>  include/linux/iommu.h | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b7ea11fc1a93..00348e4c3c26 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -156,10 +156,24 @@ struct iommu_resv_region {
>   enum iommu_resv_typetype;
>  };
>  
> -/* Per device IOMMU features */
> +/**
> + * enum iommu_dev_features - Per device IOMMU features
> + * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
> + * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
> + * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally
> + *enabling %IOMMU_DEV_FEAT_SVA requires
> + *%IOMMU_DEV_FEAT_IOPF, but some devices manage I/O Page
> + *Faults themselves instead of relying on the IOMMU. When
> + *supported, this feature must be enabled before and
> + *disabled after %IOMMU_DEV_FEAT_SVA.
> + *
> + * Device drivers query whether a feature is supported using
> + * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
> + */
>  enum iommu_dev_features {
> - IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
> - IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
> + IOMMU_DEV_FEAT_AUX,
> + IOMMU_DEV_FEAT_SVA,
> + IOMMU_DEV_FEAT_IOPF,
>  };
>  
>  #define IOMMU_PASID_INVALID  (-1U)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 02/10] iommu/arm-smmu-v3: Use device properties for pasid-num-bits

2021-01-31 Thread Auger Eric
Hi,

On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> The pasid-num-bits property shouldn't need a dedicated fwspec field,
> it's a job for device properties. Add properties for IORT, and access
> the number of PASID bits using device_property_read_u32().
> 
> Suggested-by: Robin Murphy 
> Acked-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric

> ---
>  include/linux/iommu.h   |  2 --
>  drivers/acpi/arm64/iort.c   | 13 +++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ++-
>  drivers/iommu/of_iommu.c|  5 -
>  4 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index bdf3f34a4457..b7ea11fc1a93 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -571,7 +571,6 @@ struct iommu_group *fsl_mc_device_group(struct device 
> *dev);
>   * @ops: ops for this device's IOMMU
>   * @iommu_fwnode: firmware handle for this device's IOMMU
>   * @flags: IOMMU_FWSPEC_* flags
> - * @num_pasid_bits: number of PASID bits supported by this device
>   * @num_ids: number of associated device IDs
>   * @ids: IDs which this device may present to the IOMMU
>   */
> @@ -579,7 +578,6 @@ struct iommu_fwspec {
>   const struct iommu_ops  *ops;
>   struct fwnode_handle*iommu_fwnode;
>   u32 flags;
> - u32 num_pasid_bits;
>   unsigned intnum_ids;
>   u32 ids[];
>  };
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index d4eac6d7e9fb..c9a8bbb74b09 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -968,15 +968,16 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, 
> u16 alias, void *data)
>  static void iort_named_component_init(struct device *dev,
> struct acpi_iort_node *node)
>  {
> + struct property_entry props[2] = {};
>   struct acpi_iort_named_component *nc;
> - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> -
> - if (!fwspec)
> - return;
>  
>   nc = (struct acpi_iort_named_component *)node->node_data;
> - fwspec->num_pasid_bits = FIELD_GET(ACPI_IORT_NC_PASID_BITS,
> -nc->node_flags);
> + props[0] = PROPERTY_ENTRY_U32("pasid-num-bits",
> +   FIELD_GET(ACPI_IORT_NC_PASID_BITS,
> + nc->node_flags));
> +
> + if (device_add_properties(dev, props))
> + dev_warn(dev, "Could not add device properties\n");
>  }
>  
>  static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index baebaac34a83..88dd9feb32f4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2392,7 +2392,8 @@ static struct iommu_device 
> *arm_smmu_probe_device(struct device *dev)
>   }
>   }
>  
> - master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> + device_property_read_u32(dev, "pasid-num-bits", >ssid_bits);
> + master->ssid_bits = min(smmu->ssid_bits, master->ssid_bits);
>  
>   /*
>* Note that PASID must be enabled before, and disabled after ATS:
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e505b9130a1c..a9d2df001149 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -210,11 +210,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>of_pci_iommu_init, );
>   } else {
>   err = of_iommu_configure_device(master_np, dev, id);
> -
> - fwspec = dev_iommu_fwspec_get(dev);
> - if (!err && fwspec)
> - of_property_read_u32(master_np, "pasid-num-bits",
> -  >num_pasid_bits);
>   }
>  
>   /*
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 08/10] dt-bindings: document stall property for IOMMU masters

2021-01-31 Thread Auger Eric
Hi Jean-Philippe,

On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall,
> which is the ability to recover from page faults. Let the firmware tell us
> when a device supports stall.
> 
> Reviewed-by: Rob Herring 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric
> ---
>  .../devicetree/bindings/iommu/iommu.txt| 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt 
> b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 3c36334e4f94..26ba9e530f13 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -92,6 +92,24 @@ Optional properties:
>tagging DMA transactions with an address space identifier. By default,
>this is 0, which means that the device only has one address space.
>  
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
>  
>  Notes:
>  ==
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-01-31 Thread Auger Eric
Hi Jean,

Some rather minor comments§questions below that may not justify a respin.

On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCIe PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked
> and the OS is given a chance to fix the page tables and retry the
> transaction.
> 
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler.
> If the fault is recoverable, it will call us back to terminate or
> continue the stall.
> 
> To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which
> initializes the fault queue for the device.
> 
> Tested-by: Zhangfei Gao 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  43 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  59 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 189 +-
>  3 files changed, 276 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 7b15b7580c6e..59af0bbd2f7b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -354,6 +354,13 @@
>  #define CMDQ_PRI_1_GRPID GENMASK_ULL(8, 0)
>  #define CMDQ_PRI_1_RESP  GENMASK_ULL(13, 12)
>  
> +#define CMDQ_RESUME_0_RESP_TERM  0UL
> +#define CMDQ_RESUME_0_RESP_RETRY 1UL
> +#define CMDQ_RESUME_0_RESP_ABORT 2UL
> +#define CMDQ_RESUME_0_RESP   GENMASK_ULL(13, 12)
> +#define CMDQ_RESUME_0_SIDGENMASK_ULL(63, 32)
> +#define CMDQ_RESUME_1_STAG   GENMASK_ULL(15, 0)
> +
>  #define CMDQ_SYNC_0_CS   GENMASK_ULL(13, 12)
>  #define CMDQ_SYNC_0_CS_NONE  0
>  #define CMDQ_SYNC_0_CS_IRQ   1
> @@ -370,6 +377,25 @@
>  
>  #define EVTQ_0_IDGENMASK_ULL(7, 0)
>  
> +#define EVT_ID_TRANSLATION_FAULT 0x10
> +#define EVT_ID_ADDR_SIZE_FAULT   0x11
> +#define EVT_ID_ACCESS_FAULT  0x12
> +#define EVT_ID_PERMISSION_FAULT  0x13
> +
> +#define EVTQ_0_SSV   (1UL << 11)
> +#define EVTQ_0_SSID  GENMASK_ULL(31, 12)
> +#define EVTQ_0_SID   GENMASK_ULL(63, 32)
> +#define EVTQ_1_STAG  GENMASK_ULL(15, 0)
> +#define EVTQ_1_STALL (1UL << 31)
> +#define EVTQ_1_PnU   (1UL << 33)
> +#define EVTQ_1_InD   (1UL << 34)
> +#define EVTQ_1_RnW   (1UL << 35)
> +#define EVTQ_1_S2(1UL << 39)
> +#define EVTQ_1_CLASS GENMASK_ULL(41, 40)
> +#define EVTQ_1_TT_READ   (1UL << 44)
> +#define EVTQ_2_ADDR  GENMASK_ULL(63, 0)
> +#define EVTQ_3_IPA   GENMASK_ULL(51, 12)
> +
>  /* PRI queue */
>  #define PRIQ_ENT_SZ_SHIFT4
>  #define PRIQ_ENT_DWORDS  ((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
> @@ -464,6 +490,13 @@ struct arm_smmu_cmdq_ent {
>   enum pri_resp   resp;
>   } pri;
>  
> + #define CMDQ_OP_RESUME  0x44
> + struct {
> + u32 sid;
> + u16 stag;
> + u8  resp;
> + } resume;
> +
>   #define CMDQ_OP_CMD_SYNC0x46
>   struct {
>   u64 msiaddr;
> @@ -522,6 +555,7 @@ struct arm_smmu_cmdq_batch {
>  
>  struct arm_smmu_evtq {
>   struct arm_smmu_queue   q;
> + struct iopf_queue   *iopf;
>   u32 max_stalls;
>  };
>  
> @@ -659,7 +693,9 @@ struct arm_smmu_master {
>   struct arm_smmu_stream  *streams;
>   unsigned intnum_streams;
>   boolats_enabled;
> + boolstall_enabled;
>   boolsva_enabled;
> + booliopf_enabled;
>   struct list_headbonds;
>   unsigned intssid_bits;
>  };
> @@ -678,6 +714,7 @@ struct arm_smmu_domain {
>  
>   struct io_pgtable_ops   *pgtbl_ops;
>   boolnon_strict;
> + boolstall_enabled;
>   atomic_tnr_ats_masters;
>  
>   enum arm_smmu_domain_stage  stage;
> @@ -719,6 +756,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master 
> *master);
>  bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
>  int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
>  int 

Re: [PATCH v12 06/10] iommu: Add a page fault handler

2021-01-31 Thread Auger Eric
Hi Jean,
On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> Some systems allow devices to handle I/O Page Faults in the core mm. For
> example systems implementing the PCIe PRI extension or Arm SMMU stall
> model. Infrastructure for reporting these recoverable page faults was
> added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
> fault report API"). Add a page fault handler for host SVA.
> 
> IOMMU driver can now instantiate several fault workqueues and link them
> to IOPF-capable devices. Drivers can choose between a single global
> workqueue, one per IOMMU device, one per low-level fault queue, one per
> domain, etc.
> 
> When it receives a fault event, most commonly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault(), which
> calls the registered handler. The page fault handler then calls the mm
> fault handler, and reports either success or failure with
> iommu_page_response(). After the handler succeeds, the hardware retries
> the access.
> 
> The iopf_param pointer could be embedded into iommu_fault_param. But
> putting iopf_param into the iommu_param structure allows us not to care
> about ordering between calls to iopf_queue_add_device() and
> iommu_register_device_fault_handler().
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva-lib.h |  53 
>  include/linux/iommu.h |   2 +
>  drivers/iommu/io-pgfault.c| 461 ++
>  4 files changed, 517 insertions(+)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 61bd30cd8369..60fafc23dee6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
> +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index b40990aef3fd..031155010ca8 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max);
>  void iommu_sva_free_pasid(struct mm_struct *mm);
>  struct mm_struct *iommu_sva_find(ioasid_t pasid);
>  
> +/* I/O Page fault */
> +struct device;
> +struct iommu_fault;
> +struct iopf_queue;
> +
> +#ifdef CONFIG_IOMMU_SVA_LIB
> +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
> +
> +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
> +int iopf_queue_remove_device(struct iopf_queue *queue,
> +  struct device *dev);
> +int iopf_queue_flush_dev(struct device *dev);
> +struct iopf_queue *iopf_queue_alloc(const char *name);
> +void iopf_queue_free(struct iopf_queue *queue);
> +int iopf_queue_discard_partial(struct iopf_queue *queue);
> +
> +#else /* CONFIG_IOMMU_SVA_LIB */
> +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_add_device(struct iopf_queue *queue,
> + struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_remove_device(struct iopf_queue *queue,
> +struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_flush_dev(struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline struct iopf_queue *iopf_queue_alloc(const char *name)
> +{
> + return NULL;
> +}
> +
> +static inline void iopf_queue_free(struct iopf_queue *queue)
> +{
> +}
> +
> +static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
> +{
> + return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA_LIB */
>  #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 00348e4c3c26..edc9be443a74 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -366,6 +366,7 @@ struct iommu_fault_param {
>   * struct dev_iommu - Collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
> + * @iopf_param:   I/O Page Fault queue and data
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
>   * @priv: IOMMU Driver private data
> @@ -376,6 +377,7 @@ struct iommu_fault_param {
>  struct dev_iommu {
>   struct mutex lock;
>   struct iommu_fault_param*fault_param;
> + struct iopf_device_param*iopf_param;
>   struct iommu_fwspec *fwspec;
>   struct iommu_device *iommu_dev;
>   void*priv;
> diff --git a/drivers/iommu/io-pgfault.c 

Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-25 Thread Auger Eric
Hi Vivek,

On 1/21/21 6:34 PM, Vivek Kumar Gautam wrote:
> Hi Eric,
> 
> 
> On 1/19/21 2:33 PM, Auger Eric wrote:
>> Hi Vivek,
>>
>> On 1/15/21 1:13 PM, Vivek Gautam wrote:
>>> This patch-series aims at enabling Nested stage translation in guests
>>> using virtio-iommu as the paravirtualized iommu. The backend is
>>> supported
>>> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
>>>
>>> This series derives its purpose from various efforts happening to add
>>> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
>>> most of the support for SVA has already landed. The support for nested
>>> stage translation and fault reporting to guest has been proposed [1].
>>> The related changes required in VFIO [2] framework have also been put
>>> forward.
>>>
>>> This series proposes changes in virtio-iommu to program PASID tables
>>> and related stage-1 page tables. A simple iommu-pasid-table library
>>> is added for this purpose that interacts with vendor drivers to
>>> allocate and populate PASID tables.
>>> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
>>> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
>>> to support allocating CD tables, and populating them with right values.
>>> These CD tables are essentially the PASID tables and contain stage-1
>>> page table configurations too.
>>> A request to setup these CD tables come from virtio-iommu driver using
>>> the iommu-pasid-table library when running on Arm. The virtio-iommu
>>> then pass these PASID tables to the host using the right virtio backend
>>> and support in VMM.
>>>
>>> For testing we have added necessary support in kvmtool. The changes in
>>> kvmtool are based on virtio-iommu development branch by Jean-Philippe
>>> Brucker [3].
>>>
>>> The tested kernel branch contains following in the order bottom to top
>>> on the git hash -
>>> a) v5.11-rc3
>>> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>>>     table support for Arm.
>>> c) Smmu test engine patches from Jean-Philippe's branch [4]
>>> d) This series
>>> e) Domain nesting info patches [5][6][7].
>>> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>>>     the list).
>>>
>>> This kernel is tested on Neoverse reference software stack with
>>> Fixed virtual platform. Public version of the software stack and
>>> FVP is available here[8][9].
>>>
>>> A big thanks to Jean-Philippe for his contributions towards this work
>>> and for his valuable guidance.
>>>
>>> [1]
>>> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
>>>
>>> [2]
>>> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
>>>
>>> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
>>> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
>>> [5]
>>> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
>>>
>>> [6]
>>> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
>>>
>>> [7]
>>> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
>>>
>>> [8]
>>> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
>>>
>>> [9]
>>> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst
>>>
>>
>> Could you share a public branch where we could find all the kernel
>> pieces.
>>
>> Thank you in advance
> 
> Apologies for the delay. It took a bit of time to sort things out for a
> public branch.
> The branch is available in my github now. Please have a look.
> 
> https://github.com/vivek-arm/linux/tree/5.11-rc3-nested-pgtbl-arm-smmuv3-virtio-iommu

no problem. Thank you for the link.

Best Regards

Eric
> 
> 
> 
> Thanks and regards
> Vivek
> 
>>
>> Best Regards
>>
>> Eric
>>>
>>> Jean-Philippe Brucker (6):
>>>    iommu/virtio: Add headers for table format probing
>>>    iommu/virtio: Add table format probing
>>>    iommu/virtio: Add headers for binding pasid table in iommu
>>>    iommu/virtio: Add support for INVALIDATE request
>>>    iommu/virtio: Attach 

Re: [PATCH v10 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-01-22 Thread Auger Eric
Hi Jean,

On 1/21/21 1:36 PM, Jean-Philippe Brucker wrote:
> When handling faults from the event or PRI queue, we need to find the
> struct device associated with a SID. Add a rb_tree to keep track of
> SIDs.
> 
> Acked-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric

> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 161 
>  2 files changed, 144 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 96c2e9565e00..8ef6a1c48635 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -636,6 +636,15 @@ struct arm_smmu_device {
>  
>   /* IOMMU core code handle */
>   struct iommu_device iommu;
> +
> + struct rb_root  streams;
> + struct mutexstreams_mutex;
> +};
> +
> +struct arm_smmu_stream {
> + u32 id;
> + struct arm_smmu_master  *master;
> + struct rb_node  node;
>  };
>  
>  /* SMMU private data for each master */
> @@ -644,8 +653,8 @@ struct arm_smmu_master {
>   struct device   *dev;
>   struct arm_smmu_domain  *domain;
>   struct list_headdomain_head;
> - u32 *sids;
> - unsigned intnum_sids;
> + struct arm_smmu_stream  *streams;
> + unsigned intnum_streams;
>   boolats_enabled;
>   boolsva_enabled;
>   struct list_headbonds;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 6a53b4edf054..db5d6aa76c3a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -912,8 +912,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
> *smmu_domain,
>  
>   spin_lock_irqsave(_domain->devices_lock, flags);
>   list_for_each_entry(master, _domain->devices, domain_head) {
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.cfgi.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.cfgi.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu, , );
>   }
>   }
> @@ -1355,6 +1355,32 @@ static int arm_smmu_init_l2_strtab(struct 
> arm_smmu_device *smmu, u32 sid)
>   return 0;
>  }
>  
> +__maybe_unused
> +static struct arm_smmu_master *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> + struct rb_node *node;
> + struct arm_smmu_stream *stream;
> + struct arm_smmu_master *master = NULL;
> +
> + mutex_lock(>streams_mutex);
> + node = smmu->streams.rb_node;
> + while (node) {
> + stream = rb_entry(node, struct arm_smmu_stream, node);
> + if (stream->id < sid) {
> + node = node->rb_right;
> + } else if (stream->id > sid) {
> + node = node->rb_left;
> + } else {
> + master = stream->master;
> + break;
> + }
> + }
> + mutex_unlock(>streams_mutex);
> +
> + return master;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -1588,8 +1614,8 @@ static int arm_smmu_atc_inv_master(struct 
> arm_smmu_master *master)
>  
>   arm_smmu_atc_inv_to_cmd(0, 0, 0, );
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_issue_cmd(master->smmu, );
>   }
>  
> @@ -1632,8 +1658,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
> *smmu_domain, int ssid,
>   if (!master->ats_enabled)
>   continue;
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu_domain->smmu, , );
> 

Re: [PATCH v7 02/16] iommu/smmu: Report empty domain nesting info

2021-01-19 Thread Auger Eric
Hi Yi, Vivek,

On 1/13/21 6:56 AM, Liu, Yi L wrote:
> Hi Vivek,
> 
>> From: Vivek Gautam 
>> Sent: Tuesday, January 12, 2021 7:06 PM
>>
>> Hi Yi,
>>
>>
>> On Tue, Jan 12, 2021 at 2:51 PM Liu, Yi L  wrote:
>>>
>>> Hi Vivek,
>>>
 From: Vivek Gautam 
 Sent: Tuesday, January 12, 2021 2:50 PM

 Hi Yi,


 On Thu, Sep 10, 2020 at 4:13 PM Liu Yi L  wrote:
>
> This patch is added as instead of returning a boolean for
 DOMAIN_ATTR_NESTING,
> iommu_domain_get_attr() should return an iommu_nesting_info
>> handle.
 For
> now, return an empty nesting info struct for now as true nesting is not
> yet supported by the SMMUs.
>
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> ---
> v5 -> v6:
> *) add review-by from Eric Auger.
>
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29
 +++--
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 29
 +++--
>  2 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 7196207..016e2e5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3019,6 +3019,32 @@ static struct iommu_group
 *arm_smmu_device_group(struct device *dev)
> return group;
>  }
>
> +static int arm_smmu_domain_nesting_info(struct
>> arm_smmu_domain
 *smmu_domain,
> +   void *data)
> +{
> +   struct iommu_nesting_info *info = (struct iommu_nesting_info
 *)data;
> +   unsigned int size;
> +
> +   if (!info || smmu_domain->stage !=
>> ARM_SMMU_DOMAIN_NESTED)
> +   return -ENODEV;
> +
> +   size = sizeof(struct iommu_nesting_info);
> +
> +   /*
> +* if provided buffer size is smaller than expected, should
> +* return 0 and also the expected buffer size to caller.
> +*/
> +   if (info->argsz < size) {
> +   info->argsz = size;
> +   return 0;
> +   }
> +
> +   /* report an empty iommu_nesting_info for now */
> +   memset(info, 0x0, size);
> +   info->argsz = size;
> +   return 0;
> +}
> +
>  static int arm_smmu_domain_get_attr(struct iommu_domain
>> *domain,
> enum iommu_attr attr, void *data)
>  {
> @@ -3028,8 +3054,7 @@ static int
>> arm_smmu_domain_get_attr(struct
 iommu_domain *domain,
> case IOMMU_DOMAIN_UNMANAGED:
> switch (attr) {
> case DOMAIN_ATTR_NESTING:
> -   *(int *)data = (smmu_domain->stage ==
 ARM_SMMU_DOMAIN_NESTED);
> -   return 0;
> +   return
>> arm_smmu_domain_nesting_info(smmu_domain,
 data);

 Thanks for the patch.
 This would unnecessarily overflow 'data' for any caller that's expecting
>> only
 an int data. Dump from one such issue that I was seeing when testing
 this change along with local kvmtool changes is pasted below [1].

 I could get around with the issue by adding another (iommu_attr) -
 DOMAIN_ATTR_NESTING_INFO that returns (iommu_nesting_info).
>>>
>>> nice to hear from you. At first, we planned to have a separate iommu_attr
>>> for getting nesting_info. However, we considered there is no existing user
>>> which gets DOMAIN_ATTR_NESTING, so we decided to reuse it for iommu
>> nesting
>>> info. Could you share me the code base you are using? If the error you
>>> encountered is due to this change, so there should be a place which gets
>>> DOMAIN_ATTR_NESTING.
>>
>> I am currently working on top of Eric's tree for nested stage support [1].
>> My best guess was that the vfio_pci_dma_fault_init() method [2] that is
>> requesting DOMAIN_ATTR_NESTING causes stack overflow, and corruption.
>> That's when I added a new attribute.
> 
> I see. I think there needs a change in the code there. Should also expect
> a nesting_info returned instead of an int anymore. @Eric, how about your
> opinion?
> 
>   domain = iommu_get_domain_for_dev(>pdev->dev);
>   ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
>   if (ret || !(info.features & IOMMU_NESTING_FEAT_PAGE_RESP)) {
>   /*
>* No need go futher as no page request service support.
>*/
>   return 0;
>   }
Sure I think it is "just" a matter of synchro between the 2 series. Yi,
do you have plans to 

Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-19 Thread Auger Eric
Hi Vivek,

On 1/15/21 1:13 PM, Vivek Gautam wrote:
> This patch-series aims at enabling Nested stage translation in guests
> using virtio-iommu as the paravirtualized iommu. The backend is supported
> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
> 
> This series derives its purpose from various efforts happening to add
> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
> most of the support for SVA has already landed. The support for nested
> stage translation and fault reporting to guest has been proposed [1].
> The related changes required in VFIO [2] framework have also been put
> forward.
> 
> This series proposes changes in virtio-iommu to program PASID tables
> and related stage-1 page tables. A simple iommu-pasid-table library
> is added for this purpose that interacts with vendor drivers to
> allocate and populate PASID tables.
> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
> to support allocating CD tables, and populating them with right values.
> These CD tables are essentially the PASID tables and contain stage-1
> page table configurations too.
> A request to setup these CD tables come from virtio-iommu driver using
> the iommu-pasid-table library when running on Arm. The virtio-iommu
> then pass these PASID tables to the host using the right virtio backend
> and support in VMM.
> 
> For testing we have added necessary support in kvmtool. The changes in
> kvmtool are based on virtio-iommu development branch by Jean-Philippe
> Brucker [3].
> 
> The tested kernel branch contains following in the order bottom to top
> on the git hash -
> a) v5.11-rc3
> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>table support for Arm.
> c) Smmu test engine patches from Jean-Philippe's branch [4]
> d) This series
> e) Domain nesting info patches [5][6][7].
> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>the list).
> 
> This kernel is tested on Neoverse reference software stack with
> Fixed virtual platform. Public version of the software stack and
> FVP is available here[8][9].
> 
> A big thanks to Jean-Philippe for his contributions towards this work
> and for his valuable guidance.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
> [2] 
> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
> [5] 
> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
> [6] 
> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
> [7] 
> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
> [8] 
> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
> [9] 
> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst

Could you share a public branch where we could find all the kernel pieces.

Thank you in advance

Best Regards

Eric
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Add headers for table format probing
>   iommu/virtio: Add table format probing
>   iommu/virtio: Add headers for binding pasid table in iommu
>   iommu/virtio: Add support for INVALIDATE request
>   iommu/virtio: Attach Arm PASID tables when available
>   iommu/virtio: Add support for Arm LPAE page table format
> 
> Vivek Gautam (9):
>   iommu/arm-smmu-v3: Create a Context Descriptor library
>   iommu: Add a simple PASID table library
>   iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table
>   iommu/arm-smmu-v3: Update CD base address info for user-space
>   iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib
>   iommu: Add asid_bits to arm smmu-v3 stage1 table info
>   iommu/virtio: Update table format probing header
>   iommu/virtio: Prepare to add attach pasid table infrastructure
>   iommu/virtio: Update fault type and reason info for viommu fault
> 
>  drivers/iommu/arm/arm-smmu-v3/Makefile|   2 +-
>  .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c  | 283 +++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  16 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +-
>  drivers/iommu/iommu-pasid-table.h | 140 
>  drivers/iommu/virtio-iommu.c  | 692 +-
>  include/uapi/linux/iommu.h|   2 +-
>  include/uapi/linux/virtio_iommu.h | 158 +++-
>  9 files changed, 1303 insertions(+), 262 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
>  create mode 100644 drivers/iommu/iommu-pasid-table.h
> 

___
iommu mailing list

Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Jean,

On 1/14/21 6:33 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
>>>>  The uacce-devel branches from
>>>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>>>> (they track the latest sva/zip-devel branch
>>>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
>> As I plan to respin shortly, please could you confirm the best branch to
>> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
>> repo). Is it up to date? Commits seem to be quite old there.
> 
> Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
> currently has the latest patches

OK thanks!

Eric
> 
> Thanks,
> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:23 AM, Auger Eric wrote:
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
>> Hi Jean,
>>
>>> -Original Message-
>>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>>> Sent: 04 December 2020 09:54
>>> To: Shameerali Kolothum Thodi 
>>> Cc: Auger Eric ; wangxingang
>>> ; Xieyingtai ;
>>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>>> vivek.gau...@arm.com; alex.william...@redhat.com;
>>> zhangfei@linaro.org; robin.mur...@arm.com;
>>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>>> ; qubingbing 
>>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>>> unmanaged ASIDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
>>>> Hi Jean/zhangfei,
>>>> Is it possible to have a branch with minimum required SVA/UACCE related
>>> patches
>>>> that are already public and can be a "stable" candidate for future respin 
>>>> of
>>> Eric's series?
>>>> Please share your thoughts.
>>>
>>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>>> based on mainline? 
>>
>> Yes. 
>>
>>  The uacce-devel branches from
>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>> (they track the latest sva/zip-devel branch
>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
As I plan to respin shortly, please could you confirm the best branch to
rebase on still is that one (uacce-devel from the linux-kernel-uadk git
repo). Is it up to date? Commits seem to be quite old there.

Thanks

Eric
>>
>> Thanks. 
>>
>> Hi Eric,
>>
>> Could you please take a look at the above branches and see whether it make 
>> sense
>> to rebase on top of either of those?
>>
>> From vSVA point of view, it will be less rebase hassle if we can do that.
> 
> Sure. I will rebase on top of this ;-)
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Shameer
>>
>>> Thanks,
>>> Jean
>>
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-01-13 Thread Auger Eric
Hi Shameer,

On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: fast devsel
> Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool : n=155456, size=2176, 
> socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port 
> will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: bus master, fast devsel, latency 0
> Memory at 800010 (64-bit, prefetchable) [size=64K]
> Memory at 80 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe 

Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>> Sent: 04 December 2020 09:54
>> To: Shameerali Kolothum Thodi 
>> Cc: Auger Eric ; wangxingang
>> ; Xieyingtai ;
>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> vivek.gau...@arm.com; alex.william...@redhat.com;
>> zhangfei@linaro.org; robin.mur...@arm.com;
>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>> ; qubingbing 
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Shameer,
>>
>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
>>> Hi Jean/zhangfei,
>>> Is it possible to have a branch with minimum required SVA/UACCE related
>> patches
>>> that are already public and can be a "stable" candidate for future respin of
>> Eric's series?
>>> Please share your thoughts.
>>
>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>> based on mainline? 
> 
> Yes. 
> 
>  The uacce-devel branches from
>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>> (they track the latest sva/zip-devel branch
>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> 
> Thanks. 
> 
> Hi Eric,
> 
> Could you please take a look at the above branches and see whether it make 
> sense
> to rebase on top of either of those?
> 
> From vSVA point of view, it will be less rebase hassle if we can do that.

Sure. I will rebase on top of this ;-)

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2020-12-03 Thread Auger Eric
Hi Kunkun,

On 12/3/20 1:32 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * three cases at the moment:
> Now, it should be *five cases*.
>>   *
>>   * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
> 
> I was testing this series on one of our hardware board with SMMUv3. And
> I found while trying to /"//attach_pasid_table//"/,
> 
> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
> Because the maintenance is  /"Write everything apart///
> 
> /from dword 0, sync, write dword 0, sync"/ when we update the STE
> (guest). Dose the sequence meet your expectation?

yes it does. I will fix the comments accordingly.

Is there anything to correct in the code or was it functional?

Thanks

Eric
> 
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
>>   *
>>   * Given that we can't update the STE atomically and the SMMU
>>   * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * 3. Update Config, sync
>>   */
>>  u64 val = le64_to_cpu(dst[0]);
>> -bool ste_live = false;
>> +bool s1_live = false, s2_live = false, ste_live;
>> +bool abort, nested = false, translate = false;
>>  struct arm_smmu_device *smmu = NULL;
>>  struct arm_smmu_s1_cfg *s1_cfg;
>>  struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  default:
>>  break;
>>  }
>> +nested = s1_cfg->set && s2_cfg->set;
>> +translate = s1_cfg->set || s2_cfg->set;
>>  }
>>  
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  case STRTAB_STE_0_CFG_BYPASS:
>>  break;
>>  case STRTAB_STE_0_CFG_S1_TRANS:
>> +s1_live = true;
>> +break;
>>  case STRTAB_STE_0_CFG_S2_TRANS:
>> -ste_live = true;
>> +s2_live = true;
>> +break;
>> +case STRTAB_STE_0_CFG_NESTED:
>> +s1_live = true;
>> +s2_live = true;
>>  break;
>>  case STRTAB_STE_0_CFG_ABORT:
>> -BUG_ON(!disable_bypass);
>>  break;
>>  default:
>>  BUG(); /* STE corruption */
>>  }
>>  }
>>  
>> +ste_live = s1_live || s2_live;
>> +
>>  /* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  val = STRTAB_STE_0_V;
>>  
>>  /* Bypass/fault */
>> -if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -if (!smmu_domain && disable_bypass)
>> +
>> +if (!smmu_domain)
>> +abort = disable_bypass;
>> +else
>> +abort = smmu_domain->abort;
>> +
>> +if (abort || !translate) {
>> +if (abort)
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> STRTAB_STE_0_CFG_ABORT);
>>  else
>>  val 

Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-01 Thread Auger Eric
Hi Xingang,

On 12/1/20 2:33 PM, Xingang Wang wrote:
> Hi Eric
> 
> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>   * insertion to guarantee those are observed before the TLBI. Do be
>>   * careful, 007.
>>   */
>> -if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
>> +cmd.tlbi.asid   = ext_asid;
>> +cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> 
> Found a problem here, the cmd for guest stage 1 invalidation is built,
> but it is not delivered to smmu.
> 

Thank you for the report. I will fix that soon. With that fixed, have
you been able to run vSVA on top of the series. Do you need other stuff
to be fixed at SMMU level? As I am going to respin soon, please let me
know what is the best branch to rebase to alleviate your integration.

Best Regards

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices

2020-11-24 Thread Auger Eric
Hi Shameer, Qubingbing
On 11/23/20 1:51 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 11:00
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt
>> indices
>>
>> Implement IRQ capability chain infrastructure. All interrupt
>> indexes beyond VFIO_PCI_NUM_IRQS are handled as extended
>> interrupts. They are registered with a specific type/subtype
>> and supported flags.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 99 +++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 62 ++
>>  drivers/vfio/pci/vfio_pci_private.h | 14 
>>  3 files changed, 157 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 2a6cc1a87323..93e03a4a5f32 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -608,6 +608,14 @@ static void vfio_pci_disable(struct vfio_pci_device
>> *vdev)
>>
>>  WARN_ON(iommu_unregister_device_fault_handler(>pdev->dev));
>>
>> +for (i = 0; i < vdev->num_ext_irqs; i++)
>> +vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
>> +VFIO_IRQ_SET_ACTION_TRIGGER,
>> +VFIO_PCI_NUM_IRQS + i, 0, 0, NULL);
>> +vdev->num_ext_irqs = 0;
>> +kfree(vdev->ext_irqs);
>> +vdev->ext_irqs = NULL;
>> +
>>  /* Device closed, don't need mutex here */
>>  list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
>>   >ioeventfds_list, next) {
>> @@ -823,6 +831,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device
>> *vdev, int irq_type)
>>  return 1;
>>  } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>>  return 1;
>> +} else if (irq_type >= VFIO_PCI_NUM_IRQS &&
>> +   irq_type < VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs) {
>> +return 1;
>>  }
>>
>>  return 0;
>> @@ -1008,7 +1019,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  info.flags |= VFIO_DEVICE_FLAGS_RESET;
>>
>>  info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
>> -info.num_irqs = VFIO_PCI_NUM_IRQS;
>> +info.num_irqs = VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs;
>>
>>  if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
>>  int ret = vfio_pci_info_zdev_add_caps(vdev, );
>> @@ -1187,36 +1198,87 @@ static long vfio_pci_ioctl(void *device_data,
>>
>>  } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>  struct vfio_irq_info info;
>> +struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> +unsigned long capsz;
>>
>>  minsz = offsetofend(struct vfio_irq_info, count);
>>
>> +/* For backward compatibility, cannot require this */
>> +capsz = offsetofend(struct vfio_irq_info, cap_offset);
>> +
>>  if (copy_from_user(, (void __user *)arg, minsz))
>>  return -EFAULT;
>>
>> -if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
>> +if (info.argsz < minsz ||
>> +info.index >= VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs)
>>  return -EINVAL;
>>
>> -switch (info.index) {
>> -case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>> -case VFIO_PCI_REQ_IRQ_INDEX:
>> -break;
>> -case VFIO_PCI_ERR_IRQ_INDEX:
>> -if (pci_is_pcie(vdev->pdev))
>> -break;
>> -fallthrough;
>> -default:
>> -return -EINVAL;
>> -}
>> +if (info.argsz >= capsz)
>> +minsz = capsz;
>>
>>  info.flags = VFIO_IRQ_INFO_EVENTFD;
>>
>> -info.count = vfio_pci_get_irq_count(vdev, info.index);
>> -
>> -if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
>> +switch (info.index) {
>> +case VFIO_PCI_INTX_IRQ_INDEX:
>>  info.flags |= (VFIO_IRQ_INFO_MASKABLE |
>> VFIO_IRQ_INFO_AUTOMASKED);
>> -else
>> +break;
>> +case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>> +

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2020-11-19 Thread Auger Eric
Hi Jacob,
On 11/18/20 5:19 PM, Jacob Pan wrote:
> Hi Eric,
> 
> On Wed, 18 Nov 2020 12:21:37 +0100, Eric Auger 
> wrote:
> 
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c  | 68 ++
>>  include/linux/iommu.h  | 21 
>>  include/uapi/linux/iommu.h | 54 ++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct
>> iommu_domain *domain, struct device *dev }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +  void __user *uinfo)
>> +{
>> +struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +u32 minsz;
>> +
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +/*
>> + * No new spaces can be added before the variable sized union,
>> the
>> + * minimum size is the offset to the union.
>> + */
>> +minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +/* Copy minsz from user to get flags and argsz */
>> +if (copy_from_user(_table_data, uinfo, minsz))
>> +return -EFAULT;
>> +
>> +/* Fields before the variable size union are mandatory */
>> +if (pasid_table_data.argsz < minsz)
>> +return -EINVAL;
>> +
>> +/* PASID and address granu require additional info beyond minsz
>> */
>> +if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +return -EINVAL;
>> +if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +pasid_table_data.argsz <
>> +offsetofend(struct iommu_pasid_table_config,
>> vendor_data.smmuv3))
>> +return -EINVAL;
>> +
>> +/*
>> + * User might be using a newer UAPI header which has a larger
>> data
>> + * size, we shall support the existing flags within the current
>> + * size. Copy the remaining user data _after_ minsz but not more
>> + * than the current kernel supported size.
>> + */
>> +if (copy_from_user((void *)_table_data + minsz, uinfo +
>> minsz,
>> +   min_t(u32, pasid_table_data.argsz,
>> sizeof(pasid_table_data)) - minsz))
>> +return -EFAULT;
>> +
>> +/* Now the argsz is validated, check the content */
>> +if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +return -EINVAL;
>> +
>> +return domain->ops->attach_pasid_table(domain,
>> _table_data); +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * 

Re: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-11-17 Thread Auger Eric
Hi Shameer,

On 11/17/20 12:39 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 10:43
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; t...@semihalf.com; nicoleots...@gmail.com
>> Subject: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and
>> s2_cfg
>>
>> In preparation for the introduction of nested stages
>> let's turn s1_cfg and s2_cfg fields into pointers which are
>> dynamically allocated depending on the smmu_domain stage.
> 
> This will break compile if we have CONFIG_ARM_SMMU_V3_SVA
> because ,
> https://github.com/eauger/linux/blob/5.10-rc4-2stage-v12/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c#L40
> 
> Do we really need to make these pointers?

Thanks for reporting. I think I can do differently. Working on this now.

Thanks

Eric
> 
> Thanks,
> Shameer
>  
>> In nested mode, both stages will coexist and s1_cfg will
>> be allocated when the guest configuration gets passed.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 83 -
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 +-
>>  2 files changed, 48 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index d828d6cbeb0e..4baf9fafe462 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -953,9 +953,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct
>> arm_smmu_domain *smmu_domain,
>>  unsigned int idx;
>>  struct arm_smmu_l1_ctx_desc *l1_desc;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>> -if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>> +if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>>  return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
>>
>>  idx = ssid >> CTXDESC_SPLIT;
>> @@ -990,7 +990,7 @@ int arm_smmu_write_ctx_desc(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>  __le64 *cdptr;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>>
>> -if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
>> +if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
>>  return -E2BIG;
>>
>>  cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>> @@ -1056,7 +1056,7 @@ static int arm_smmu_alloc_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  size_t l1size;
>>  size_t max_contexts;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
>> +struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
>>  struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
>>
>>  max_contexts = 1 << cfg->s1cdmax;
>> @@ -1104,7 +1104,7 @@ static void arm_smmu_free_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  int i;
>>  size_t size, l1size;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>>  if (cdcfg->l1_desc) {
>>  size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
>> @@ -1211,17 +1211,8 @@ static void arm_smmu_write_strtab_ent(struct
>> arm_smmu_master *master, u32 sid,
>>  }
>>
>>  if (smmu_domain) {
>> -switch (smmu_domain->stage) {
>> -case ARM_SMMU_DOMAIN_S1:
>> -s1_cfg = _domain->s1_cfg;
>> -break;
>> -case ARM_SMMU_DOMAIN_S2:
>> -case ARM_SMMU_DOMAIN_NESTED:
>> -s2_cfg = _domain->s2_cfg;
>> -break;
>> -default:
>> -break;
>> -}
>> +s1_cfg = smmu_domain->s1_cfg;
>> +s2_cfg = smmu_domain->s2_cfg;
>>  }
>>
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1664,10 +1655,10 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>   * careful, 007.
>>   */
>>  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> -arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>> +arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg->cd.asid);
>>  } else {
>>  cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
>> -cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +

Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-11-17 Thread Auger Eric
Hi Shameer,

On 5/13/20 5:57 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 13 May 2020 14:29
>> To: Shameerali Kolothum Thodi ;
>> Zhangfei Gao ; eric.auger@gmail.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
>> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
>> t...@semihalf.com; bbhush...@marvell.com
>> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>>
> [...]
> 
>>>>> Yes that's normal this series is not meant to support vSVM at this stage.
>>>>>
>>>>> I intend to add the missing pieces during the next weeks.
>>>>
>>>> Thanks for that. I have made an attempt to add the vSVA based on
>>>> your v10 + JPBs sva patches. The host kernel and Qemu changes can
>>>> be found here[1][2].
>>>>
>>>> This basically adds multiple pasid support on top of your changes.
>>>> I have done some basic sanity testing and we have some initial success
>>>> with the zip vf dev on our D06 platform. Please note that the STALL event 
>>>> is
>>>> not yet supported though, but works fine if we mlock() guest usr mem.
>>>
>>> I have added STALL support for our vSVA prototype and it seems to be
>>> working(on our hardware). I have updated the kernel and qemu branches
>> with
>>> the same[1][2]. I should warn you though that these are prototype code and I
>> am pretty
>>> much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
>> everything.
>>> But thought of sharing, in case if it is useful somehow!.
>>
>> Thank you again for sharing the POC. I looked at the kernel and QEMU
>> branches.
>>
>> Here are some preliminary comments:
>> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":  as
>> you mentionned S2TTB reset now is featured in v11
> 
> Yes.
> 
>> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
>> easily integrate this into my series. Update the iommu api first and
>> pass multiple CD info in a separate patch
> 
> Ok.
in v12, I added
[PATCH v12 14/15] iommu/smmuv3: Accept configs with more than one
context descriptor

I don't think you need to add s1cdmax addition as we already have
pasid_bits which should do the job.

>> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
>> cascaded to host through the PASID cache invalidation uapi (no pb you
>> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
>> think I should add this support in my original series although it does
>> not seem to trigger any issue up to now.
> 
> Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
> think
> this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
> get invalidated. 
in v12 I added [PATCH v12 15/15] iommu/smmuv3: Add PASID cache
invalidation per PASID. I have not tested it though.
> 
>> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
>> the transcode is done somewhere else with SVA but we still need to do it
>> if a single CD is used, right? I will review the SVA code to better
>> understand.

Since I have rebase on 5.10-rc4 you will still have this duplication to
handle.
> 
> Hmm..not sure. Need to take another look to see whether we need a special
> handling for single CD or not.
> 
>> - for the STALL response injection I would tend to use a new VFIO region
>> for responses. At the moment there is a single VFIO region for reporting
>> the fault.

in v12 I added a new VFIO region to inject your fault. This was tested
with dummy event injection, this should work properly.

If we clearly identify all the public dependencies needed for vSVA/ARM I
can help you respinning on top of them

Thanks

Eric
> 
> Sure. That will be much cleaner and probably improve the context switch
> latency. Another thing I noted with STALL is that pasid_valid flag needs to be
> taken care in the SVA kernel path. 
> 
> "iommu: Remove pasid validity check for STALL model page response msg"
> Not sure this one is a proper way to handle this.
>  
>> On QEMU side:
>> - I am currently working on 3.2 range invalidation support which is
>> needed for DPDK/VFIO
>> - While at it I will look at how to incrementally introduce some of the
>> features you need in this series.
> 
> Ok. 
> 
> Thanks for taking a look at the POC.
> 
> Cheers,
> Shameer
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 04/11] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2020-11-13 Thread Auger Eric
Hi Zenghui,
On 9/24/20 10:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> Add a new specific DMA_FAULT region aiming to exposed nested mode
>> translation faults.
>>
>> The region has a ring buffer that contains the actual fault
>> records plus a header allowing to handle it (tail/head indices,
>> max capacity, entry size). At the moment the region is dimensionned
>> for 512 fault records.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 379a02c36e37..586b89debed5 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -260,6 +260,69 @@ int vfio_pci_set_power_state(struct
>> vfio_pci_device *vdev, pci_power_t state)
>>   return ret;
>>   }
>>   +static void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,
>> +   struct vfio_pci_region *region)
>> +{
>> +}
>> +
>> +static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device
>> *vdev,
>> + struct vfio_pci_region *region,
>> + struct vfio_info_cap *caps)
>> +{
>> +    struct vfio_region_info_cap_fault cap = {
>> +    .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>> +    .header.version = 1,
>> +    .version = 1,
>> +    };
>> +    return vfio_info_add_capability(caps, , sizeof(cap));
>> +}
>> +
>> +static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
>> +    .rw    = vfio_pci_dma_fault_rw,
>> +    .release    = vfio_pci_dma_fault_release,
>> +    .add_capability = vfio_pci_dma_fault_add_capability,
>> +};
>> +
>> +#define DMA_FAULT_RING_LENGTH 512
>> +
>> +static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
>> +{
>> +    struct vfio_region_dma_fault *header;
>> +    size_t size;
>> +    int ret;
>> +
>> +    mutex_init(>fault_queue_lock);
>> +
>> +    /*
>> + * We provision 1 page for the header and space for
>> + * DMA_FAULT_RING_LENGTH fault records in the ring buffer.
>> + */
>> +    size = ALIGN(sizeof(struct iommu_fault) *
>> + DMA_FAULT_RING_LENGTH, PAGE_SIZE) + PAGE_SIZE;
>> +
>> +    vdev->fault_pages = kzalloc(size, GFP_KERNEL);
>> +    if (!vdev->fault_pages)
>> +    return -ENOMEM;
>> +
>> +    ret = vfio_pci_register_dev_region(vdev,
>> +    VFIO_REGION_TYPE_NESTED,
>> +    VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
>> +    _pci_dma_fault_regops, size,
>> +    VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE,
>> +    vdev->fault_pages);
>> +    if (ret)
>> +    goto out;
>> +
>> +    header = (struct vfio_region_dma_fault *)vdev->fault_pages;
>> +    header->entry_size = sizeof(struct iommu_fault);
>> +    header->nb_entries = DMA_FAULT_RING_LENGTH;
>> +    header->offset = sizeof(struct vfio_region_dma_fault);
>> +    return 0;
>> +out:
>> +    kfree(vdev->fault_pages);
>> +    return ret;
>> +}
>> +
>>   static int vfio_pci_enable(struct vfio_pci_device *vdev)
>>   {
>>   struct pci_dev *pdev = vdev->pdev;
>> @@ -358,6 +421,10 @@ static int vfio_pci_enable(struct vfio_pci_device
>> *vdev)
>>   }
>>   }
>>   +    ret = vfio_pci_init_dma_fault_region(vdev);
>> +    if (ret)
>> +    goto disable_exit;
>> +
>>   vfio_pci_probe_mmaps(vdev);
>>     return 0;
>> @@ -1383,6 +1450,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
>>     vfio_iommu_group_put(pdev->dev.iommu_group, >dev);
>>   kfree(vdev->region);
>> +    kfree(vdev->fault_pages);
>>   mutex_destroy(>ioeventfds_lock);
>>     if (!disable_idle_d3)
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h
>> b/drivers/vfio/pci/vfio_pci_private.h
>> index 8a2c7607d513..a392f50e3a99 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -119,6 +119,8 @@ struct vfio_pci_device {
>>   int    ioeventfds_nr;
>>   struct eventfd_ctx    *err_trigger;
>>   struct eventfd_ctx    *req_trigger;
>> +    u8    *fault_pages;
> 
> What's the reason to use 'u8'? It doesn't match the type of header, nor
> the type of ring buffer.
actually it matches
u8  *pci_config_map;
u8  *vconfig;

fault_pages is the va of the ring buffer. In the header, offset is the
offset of the ring wrt start of the region.

> 
>> +    struct mutex    fault_queue_lock;
>>   struct list_head    dummy_resources_list;
>>   struct mutex    ioeventfds_lock;
>>   struct list_head    ioeventfds_list;
>> @@ -150,6 +152,14 @@ extern ssize_t vfio_pci_vga_rw(struct
>> vfio_pci_device *vdev, char __user *buf,
>>   extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t
>> offset,
>>  uint64_t data, int count, int fd);
>>   +struct vfio_pci_fault_abi {
>> +    u32 entry_size;
>> +};
> 
> This is not used by this patch (and the whole series).
removed
> 
>> +
>> +extern size_t 

Re: [PATCH v10 05/11] vfio/pci: Register an iommu fault handler

2020-11-13 Thread Auger Eric
Hi Zenghui,

On 9/24/20 10:49 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> Register an IOMMU fault handler which records faults in
>> the DMA FAULT region ring buffer. In a subsequent patch, we
>> will add the signaling of a specific eventfd to allow the
>> userspace to be notified whenever a new fault as shown up.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 586b89debed5..69595c240baf 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -27,6 +27,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>     #include "vfio_pci_private.h"
>>   @@ -283,6 +284,38 @@ static const struct vfio_pci_regops
>> vfio_pci_dma_fault_regops = {
>>   .add_capability = vfio_pci_dma_fault_add_capability,
>>   };
>>   +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault *fault,
>> void *data)
>> +{
>> +    struct vfio_pci_device *vdev = (struct vfio_pci_device *)data;
>> +    struct vfio_region_dma_fault *reg =
>> +    (struct vfio_region_dma_fault *)vdev->fault_pages;
>> +    struct iommu_fault *new =
>> +    (struct iommu_fault *)(vdev->fault_pages + reg->offset +
>> +    reg->head * reg->entry_size);
> 
> Shouldn't 'reg->head' be protected under the fault_queue_lock? Otherwise
> things may change behind our backs...>
> We shouldn't take any assumption about how IOMMU driver would report the
> fault (serially or in parallel), I think.

Yes I modified the locking

Thanks

Eric
> 
>> +    int head, tail, size;
>> +    int ret = 0;
>> +
>> +    if (fault->type != IOMMU_FAULT_DMA_UNRECOV)
>> +    return -ENOENT;
>> +
>> +    mutex_lock(>fault_queue_lock);
>> +
>> +    head = reg->head;
>> +    tail = reg->tail;
>> +    size = reg->nb_entries;
>> +
>> +    if (CIRC_SPACE(head, tail, size) < 1) {
>> +    ret = -ENOSPC;
>> +    goto unlock;
>> +    }
>> +
>> +    *new = *fault;
>> +    reg->head = (head + 1) % size;
>> +unlock:
>> +    mutex_unlock(>fault_queue_lock);
>> +    return ret;
>> +}
>> +
>>   #define DMA_FAULT_RING_LENGTH 512
>>     static int vfio_pci_init_dma_fault_region(struct vfio_pci_device
>> *vdev)
>> @@ -317,6 +350,13 @@ static int vfio_pci_init_dma_fault_region(struct
>> vfio_pci_device *vdev)
>>   header->entry_size = sizeof(struct iommu_fault);
>>   header->nb_entries = DMA_FAULT_RING_LENGTH;
>>   header->offset = sizeof(struct vfio_region_dma_fault);
>> +
>> +    ret = iommu_register_device_fault_handler(>pdev->dev,
>> +    vfio_pci_iommu_dev_fault_handler,
>> +    vdev);
>> +    if (ret)
>> +    goto out;
>> +
>>   return 0;
>>   out:
>>   kfree(vdev->fault_pages);
> 
> 
> Thanks,
> Zenghui
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-10-27 Thread Auger Eric
Hi Shameer,

On 10/27/20 1:20 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
>> Auger Eric
>> Sent: 23 September 2020 12:47
>> To: yuzenghui ; eric.auger@gmail.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; robin.mur...@arm.com
>> Subject: Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE
> 
> ...
> 
>>> Besides, before going through the whole series [1][2], I'd like to
>>> know if this is the latest version of your Nested-Stage-Setup work in
>>> case I had missed something.
>>>
>>> [1]
>>> https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
>>> [2]
>>> https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com
>>
>> yes those 2 series are the last ones. Thank you for reviewing.
>>
>> FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10 0/7]
>> IOMMU user API enhancement. 
> 
> Thanks for that. Also is there any plan to respin the related Qemu series as 
> well?
> I know dual stage interface proposals are still under discussion, but it 
> would be
> nice to have a testable solution based on new interfaces for ARM64 as well.
> Happy to help with any tests or verifications.

Yes the QEMU series will be respinned as well. That's on the top of my
todo list right now.

Thanks

Eric
> 
> Please let me know.
> 
> Thanks,
> Shameer
>   
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture

2020-10-06 Thread Auger Eric
Hi all,

On 10/5/20 3:08 PM, Christoph Hellwig wrote:
> On Mon, Oct 05, 2020 at 11:44:10AM +0100, Lorenzo Pieralisi wrote:
>>> I see that there are both OF and ACPI hooks in pci_dma_configure() and
>>> both modify dev->dma_mask, which is what pci-sysfs is exposing here,
>>> but I'm not convinced this even does what it's intended to do.  The
>>> driver core calls this via the bus->dma_configure callback before
>>> probing a driver, but then what happens when the driver calls
>>> pci_set_dma_mask()?  This is just a wrapper for dma_set_mask() and I
>>> don't see anywhere that would take into account the existing
>>> dev->dma_mask.  It seems for example that pci_dma_configure() could
>>> produce a 42 bit mask as we have here, then the driver could override
>>> that with anything that the dma_ops.dma_supported() callback finds
>>> acceptable, and I don't see any instances where the current
>>> dev->dma_mask is considered.  Am I overlooking something? 
>>
>> I don't think so but Christoph and Robin can provide more input on
>> this - it is a long story.
>>
>> ACPI and OF bindings set a default dma_mask (and dev->bus_dma_limit),
>> this does not prevent a driver from overriding the dev->dma_mask but DMA
>> mapping code still takes into account the dev->bus_dma_limit.
>>
>> This may help:
>>
>> git log -p 03bfdc31176c

Thank you Lorenzo for the pointer.
> 
> This is at best a historic artefact.  Bus drivers have no business
> messing with the DMA mask, dev->bus_dma_limit is the way to communicate
> addressing limits on the bus (or another interconnect closer to the CPU).
> 
Then could I envision to use the dev->bus_dma_limit instead of the
dev->dma_mask?

Nevertheless, I would need a way to let the userspace know that the
usable IOVA ranges reported by VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
takes into account the addressing limits of the bus.

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 11/11] vfio: Document nested stage control

2020-10-06 Thread Auger Eric
Hi Zenghui,

On 9/24/20 3:42 PM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> The VFIO API was enhanced to support nested stage control: a bunch of
>> new iotcls, one DMA FAULT region and an associated specific IRQ.
>>
>> Let's document the process to follow to set up nested mode.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> +The userspace must be prepared to receive faults. The VFIO-PCI device
>> +exposes one dedicated DMA FAULT region: it contains a ring buffer and
>> +its header that allows to manage the head/tail indices. The region is
>> +identified by the following index/subindex:
>> +- VFIO_REGION_TYPE_NESTED/VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT
>> +
>> +The DMA FAULT region exposes a VFIO_REGION_INFO_CAP_PRODUCER_FAULT
>> +region capability that allows the userspace to retrieve the ABI version
>> +of the fault records filled by the host.
> 
> Nit: I don't see this capability in the code.

Thank you very much for the review. I am late doing the respin but I
will take into account all your comments.

Thanks!

Eric
> 
> 
> Thanks,
> Zenghui
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-10-06 Thread Auger Eric
Hello Al,

On 10/2/20 8:23 PM, Al Stone wrote:
> On 24 Sep 2020 11:54, Auger Eric wrote:
>> Hi,
>>
>> Adding Al in the loop
>>
>> On 9/24/20 11:38 AM, Michael S. Tsirkin wrote:
>>> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
>>>> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
>>>>> OK so this looks good. Can you pls repost with the minor tweak
>>>>> suggested and all acks included, and I will queue this?
>>>>
>>>> My NACK still stands, as long as a few questions are open:
>>>>
>>>>1) The format used here will be the same as in the ACPI table? I
>>>>   think the answer to this questions must be Yes, so this leads
>>>>   to the real question:
>>>
>>> I am not sure it's a must.
>>> We can always tweak the parser if there are slight differences
>>> between ACPI and virtio formats.
>>>
>>> But we do want the virtio format used here to be approved by the virtio
>>> TC, so it won't change.
>>>
>>> Eric, Jean-Philippe, does one of you intend to create a github issue
>>> and request a ballot for the TC? It's been posted end of August with no
>>> changes ...
>> Jean-Philippe, would you?
>>>
>>>>2) Has the ACPI table format stabalized already? If and only if
>>>>   the answer is Yes I will Ack these patches. We don't need to
>>>>   wait until the ACPI table format is published in a
>>>>   specification update, but at least some certainty that it
>>>>   will not change in incompatible ways anymore is needed.
>>>>
>>
>> Al, do you have any news about the the VIOT definition submission to
>> the UEFI ASWG?
>>
>> Thank you in advance
>>
>> Best Regards
>>
>> Eric
> 
> A follow-up to my earlier post 
> 
> Hearing no objection, I've submitted the VIOT table description to
> the ASWG for consideration under what they call the "code first"
> process.  The "first reading" -- a brief discussion on what the
> table is and why we would like to add it -- was held yesterday.
> No concerns have been raised as yet.  Given the discussions that
> have already occurred, I don't expect any, either.  I have been
> wrong at least once before, however.
> 
> At this point, ASWG will revisit the request to add VIOT each
> week.  If there have been no comments in the prior week, and no
> further discussion during the meeting, then a vote will be taken.
> Otherwise, there will be discussion and we try again the next
> week.
> 
> The ASWG was also told that the likelihood of this definition of
> the table changing is pretty low, and that it has been thought out
> pretty well already.  ASWG's consideration will therefore start
> from the assumption that it would be best _not_ to make changes.
> 
> So, I'll let you know what happens next week.

Thank you very much for the updates and for your support backing the
proposal in the best delays.

Best Regards

Eric
> 
>>
>>>
>>> Not that I know, but I don't see why it's a must.
>>>
>>>> So what progress has been made with the ACPI table specification, is it
>>>> just a matter of time to get it approved or are there concerns?
>>>>
>>>> Regards,
>>>>
>>>>Joerg
>>>
>>
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture

2020-09-30 Thread Auger Eric
Hi Alex,

On 9/29/20 8:18 PM, Alex Williamson wrote:
> On Tue, 29 Sep 2020 09:18:22 +0200
> Auger Eric  wrote:
> 
>> Hi all,
>>
>> [also correcting some outdated email addresses + adding Lorenzo in cc]
>>
>> On 9/29/20 12:42 AM, Alex Williamson wrote:
>>> On Mon, 28 Sep 2020 21:50:34 +0200
>>> Eric Auger  wrote:
>>>   
>>>> VFIO currently exposes the usable IOVA regions through the
>>>> VFIO_IOMMU_GET_INFO ioctl / VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>>>> capability. However it fails to take into account the dma_mask
>>>> of the devices within the container. The top limit currently is
>>>> defined by the iommu aperture.  
>>>
>>> I think that dma_mask is traditionally a DMA API interface for a device
>>> driver to indicate to the DMA layer which mappings are accessible to the
>>> device.  On the other hand, vfio makes use of the IOMMU API where the
>>> driver is in userspace.  That userspace driver has full control of the
>>> IOVA range of the device, therefore dma_mask is mostly irrelevant to
>>> vfio.  I think the issue you're trying to tackle is that the IORT code
>>> is making use of the dma_mask to try to describe a DMA address
>>> limitation imposed by the PCI root bus, living between the endpoint
>>> device and the IOMMU.  Therefore, if the IORT code is exposing a
>>> topology or system imposed device limitation, this seems much more akin
>>> to something like an MSI reserved range, where it's not necessarily the
>>> device or the IOMMU with the limitation, but something that sits
>>> between them.  
>>
>> First I think I failed to explain the context. I worked on NVMe
>> passthrough on ARM. The QEMU NVMe backend uses VFIO to program the
>> physical device. The IOVA allocator there currently uses an IOVA range
>> within [0x1, 1ULL << 39]. This IOVA layout rather is arbitrary if I
>> understand correctly.
> 
> 39 bits is the minimum available on some VT-d systems, so it was
> probably considered a reasonable minimum address width to consider.
OK
> 
>> I noticed we rapidly get some VFIO MAP DMA
>> failures because the allocated IOVA collide with the ARM MSI reserved
>> IOVA window [0x800, 0x810]. Since  9b77e5c79840 ("vfio/type1:
>> Check reserved region conflict and update iova list"), such VFIO MAP DMA
>> attempts to map IOVAs belonging to host reserved IOVA windows fail. So,
>> by using the VFIO_IOMMU_GET_INFO ioctl /
>> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE I can change the IOVA allocator to
>> avoid allocating within this range and others. While working on this, I
>> tried to automatically compute the min/max IOVAs and change the
>> arbitrary [0x1, 1ULL << 39]. My SMMUv2 supports up to 48b so
>> naturally the max IOVA was computed as 1ULL << 48. The QEMU NVMe backend
>> allocates at the bottom and at the top of the range. I noticed the use
>> case was not working as soon as the top IOVA was more than 1ULL << 42.
>> And then we noticed the dma_mask was set to 42 by using
>> cat  /sys/bus/pci/devices/0005:01:00.0/dma_mask_bits. So my
>> interpretation is the dma_mask was somehow containing the info the
>> device couldn't handle IOVAs beyond a certain limit.
> 
> I see that there are both OF and ACPI hooks in pci_dma_configure() and
> both modify dev->dma_mask, which is what pci-sysfs is exposing here,
> but I'm not convinced this even does what it's intended to do.  The
> driver core calls this via the bus->dma_configure callback before
> probing a driver, but then what happens when the driver calls
> pci_set_dma_mask()?  This is just a wrapper for dma_set_mask() and I
> don't see anywhere that would take into account the existing
> dev->dma_mask.  It seems for example that pci_dma_configure() could
> produce a 42 bit mask as we have here, then the driver could override
> that with anything that the dma_ops.dma_supported() callback finds
> acceptable, and I don't see any instances where the current
> dev->dma_mask is considered.  Am I overlooking something? 

I don't see it either. So the dma_mask set by the driver would never be
checked against the dma_mask limited found when parsing OF/ACPI?
>  
>> In my case the 42b limit is computed in iort_dma_setup() by
>> acpi_dma_get_range(dev, , , );
>>
>> Referring to the comment, it does "Evaluate DMA regions and return
>> respectively DMA region start, offset and size in dma_addr, offset and
>> size on parsing success". This parses the ACPI table, looking for ACPI
>> companions with _DMA methods.
>&

Re: [PATCH v12 6/6] iommu/vt-d: Check UAPI data processed by IOMMU core

2020-09-29 Thread Auger Eric
Hi Jacob,

On 9/25/20 6:32 PM, Jacob Pan wrote:
> IOMMU generic layer already does sanity checks on UAPI data for version
> match and argsz range based on generic information.
> 
> This patch adjusts the following data checking responsibilities:
> - removes the redundant version check from VT-d driver
> - removes the check for vendor specific data size
> - adds check for the use of reserved/undefined flags
> 
> Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/intel/iommu.c |  3 +--
>  drivers/iommu/intel/svm.c   | 11 +--
>  include/uapi/linux/iommu.h  |  1 +
>  3 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 461f3a6864d4..18ed3b3c70d7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5408,8 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
> struct device *dev,
>   int ret = 0;
>   u64 size = 0;
>  
> - if (!inv_info || !dmar_domain ||
> - inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + if (!inv_info || !dmar_domain)
>   return -EINVAL;
>  
>   if (!dev || !dev_is_pci(dev))
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 99353d6468fa..0cb9a15f1112 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -284,8 +284,15 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   if (WARN_ON(!iommu) || !data)
>   return -EINVAL;
>  
> - if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> - data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + /* IOMMU core ensures argsz is more than the start of the union */
> + if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, 
> vendor.vtd))
> + return -EINVAL;
> +
> + /* Make sure no undefined flags are used in vendor data */
> + if (data->vendor.vtd.flags & ~(IOMMU_SVA_VTD_GPASID_LAST - 1))
>   return -EINVAL;
>  
>   if (!dev_is_pci(dev))
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 66d4ca40b40f..e1d9e75f2c94 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -288,6 +288,7 @@ struct iommu_gpasid_bind_data_vtd {
>  #define IOMMU_SVA_VTD_GPASID_PWT (1 << 3) /* page-level write through */
>  #define IOMMU_SVA_VTD_GPASID_EMTE(1 << 4) /* extended mem type enable */
>  #define IOMMU_SVA_VTD_GPASID_CD  (1 << 5) /* PASID-level cache 
> disable */
> +#define IOMMU_SVA_VTD_GPASID_LAST(1 << 6)
>   __u64 flags;
>   __u32 pat;
>   __u32 emt;
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-29 Thread Auger Eric
Hi Jacob,

On 9/25/20 6:32 PM, Jacob Pan wrote:
> IOMMU user APIs are responsible for processing user data. This patch
> changes the interface such that user pointers can be passed into IOMMU
> code directly. Separate kernel APIs without user pointers are introduced
> for in-kernel users of the UAPI functionality.
> 
> IOMMU UAPI data has a user filled argsz field which indicates the data
> length of the structure. User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, resulting in possible argsz increase.
> Backward compatibility is ensured based on size and flags (or
> the functional equivalent fields) checking.
> 
> This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> reserved/unused fields in padding, flags, and version are also checked.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> Reviewed-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/iommu.c  | 194 
> +++--
>  include/linux/iommu.h  |  28 ---
>  include/uapi/linux/iommu.h |   1 +
>  3 files changed, 207 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4ae02291ccc2..a11f2733dc54 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1961,34 +1961,214 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +/*
> + * Check flags and other user provided data for valid combinations. We also
> + * make sure no reserved fields or unused flags are set. This is to ensure
> + * not breaking userspace in the future when these fields or flags are used.
> + */
> +static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
> *info)
> +{
> + u32 mask;
> + int i;
> +
> + if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> + if (info->cache & ~mask)
> + return -EINVAL;
> +
> + if (info->granularity >= IOMMU_INV_GRANU_NR)
> + return -EINVAL;
> +
> + switch (info->granularity) {
> + case IOMMU_INV_GRANU_ADDR:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
> + return -EINVAL;
> +
> + mask = IOMMU_INV_ADDR_FLAGS_PASID |
> + IOMMU_INV_ADDR_FLAGS_ARCHID |
> + IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> + if (info->granu.addr_info.flags & ~mask)
> + return -EINVAL;
> + break;
> + case IOMMU_INV_GRANU_PASID:
> + mask = IOMMU_INV_PASID_FLAGS_PASID |
> + IOMMU_INV_PASID_FLAGS_ARCHID;
> + if (info->granu.pasid_info.flags & ~mask)
> + return -EINVAL;
> +
> + break;
> + case IOMMU_INV_GRANU_DOMAIN:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
> + return -EINVAL;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* Check reserved padding fields */
> + for (i = 0; i < sizeof(info->padding); i++) {
> + if (info->padding[i])
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
> *dev,
> - struct iommu_cache_invalidate_info *inv_info)
> + void __user *uinfo)
>  {
> + struct iommu_cache_invalidate_info inv_info = { 0 };
> + u32 minsz;
> + int ret;
> +
>   if (unlikely(!domain->ops->cache_invalidate))
>   return -ENODEV;
>  
> - return domain->ops->cache_invalidate(domain, dev, inv_info);
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(_info, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before the variable size union are mandatory */
> + if (inv_info.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz */
> + if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> + inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
> granu.pasid_info))
> + return -EINVAL;
> +
> + if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> + inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
> granu.addr_info))
> + return -EINVAL;
> +
> + /*
> +  * User might be using a 

Re: [RFC 2/3] iommu: Account for dma_mask and iommu aperture in IOVA reserved regions

2020-09-29 Thread Auger Eric
Hi Christoph,

On 9/29/20 8:03 AM, Christoph Hellwig wrote:
> On Mon, Sep 28, 2020 at 09:50:36PM +0200, Eric Auger wrote:
>> VFIO currently exposes the usable IOVA regions through the
>> VFIO_IOMMU_GET_INFO ioctl. However it fails to take into account
>> the dma_mask of the devices within the container. The top limit
>> currently is defined by the iommu aperture.
> 
> Can we take a step back here?  The dma_mask only has a meaning for
> the DMA API, and not the iommu API, it should have no relevance here.
> 
> More importantly if we are using vfio no dma_mask should be set to
> start with.

You will find more context in my reply to Alex.

Thanks

Eric
> 
>> +if (geo.aperture_end < ULLONG_MAX && geo.aperture_end != 
>> geo.aperture_start) {
> 
> Please avoid pointlessly overlong lines.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/3] iommu: Reserved regions for IOVAs beyond dma_mask and iommu aperture

2020-09-29 Thread Auger Eric
Hi all,

[also correcting some outdated email addresses + adding Lorenzo in cc]

On 9/29/20 12:42 AM, Alex Williamson wrote:
> On Mon, 28 Sep 2020 21:50:34 +0200
> Eric Auger  wrote:
> 
>> VFIO currently exposes the usable IOVA regions through the
>> VFIO_IOMMU_GET_INFO ioctl / VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> capability. However it fails to take into account the dma_mask
>> of the devices within the container. The top limit currently is
>> defined by the iommu aperture.
> 
> I think that dma_mask is traditionally a DMA API interface for a device
> driver to indicate to the DMA layer which mappings are accessible to the
> device.  On the other hand, vfio makes use of the IOMMU API where the
> driver is in userspace.  That userspace driver has full control of the
> IOVA range of the device, therefore dma_mask is mostly irrelevant to
> vfio.  I think the issue you're trying to tackle is that the IORT code
> is making use of the dma_mask to try to describe a DMA address
> limitation imposed by the PCI root bus, living between the endpoint
> device and the IOMMU.  Therefore, if the IORT code is exposing a
> topology or system imposed device limitation, this seems much more akin
> to something like an MSI reserved range, where it's not necessarily the
> device or the IOMMU with the limitation, but something that sits
> between them.

First I think I failed to explain the context. I worked on NVMe
passthrough on ARM. The QEMU NVMe backend uses VFIO to program the
physical device. The IOVA allocator there currently uses an IOVA range
within [0x1, 1ULL << 39]. This IOVA layout rather is arbitrary if I
understand correctly. I noticed we rapidly get some VFIO MAP DMA
failures because the allocated IOVA collide with the ARM MSI reserved
IOVA window [0x800, 0x810]. Since  9b77e5c79840 ("vfio/type1:
Check reserved region conflict and update iova list"), such VFIO MAP DMA
attempts to map IOVAs belonging to host reserved IOVA windows fail. So,
by using the VFIO_IOMMU_GET_INFO ioctl /
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE I can change the IOVA allocator to
avoid allocating within this range and others. While working on this, I
tried to automatically compute the min/max IOVAs and change the
arbitrary [0x1, 1ULL << 39]. My SMMUv2 supports up to 48b so
naturally the max IOVA was computed as 1ULL << 48. The QEMU NVMe backend
allocates at the bottom and at the top of the range. I noticed the use
case was not working as soon as the top IOVA was more than 1ULL << 42.
And then we noticed the dma_mask was set to 42 by using
cat  /sys/bus/pci/devices/0005:01:00.0/dma_mask_bits. So my
interpretation is the dma_mask was somehow containing the info the
device couldn't handle IOVAs beyond a certain limit.

In my case the 42b limit is computed in iort_dma_setup() by
acpi_dma_get_range(dev, , , );

Referring to the comment, it does "Evaluate DMA regions and return
respectively DMA region start, offset and size in dma_addr, offset and
size on parsing success". This parses the ACPI table, looking for ACPI
companions with _DMA methods.

But as Alex mentioned, the IORT also allows to define limits on "the
number of address bits, starting from the least significant bit that can
be generated by a device when it accesses memory". See Named component
node.Device Memory Address Size limit or PCI root complex node. Memory
address size limit.

ret = acpi_dma_get_range(dev, , , );
if (ret == -ENODEV)
ret = dev_is_pci(dev) ? rc_dma_get_range(dev, )
  : nc_dma_get_range(dev, );

So eventually those info collected from the ACPI tables which do impact
the usable IOVA range seem to be stored in the dma_mask, hence that
proposal.

> 
>> So, for instance, if the IOMMU supports up to 48bits, it may give
>> the impression the max IOVA is 48b while a device may have a
>> dma_mask of 42b. So this API cannot really be used to compute
>> the max usable IOVA.
>>
>> This patch removes the IOVA region beyond the dma_mask's.
> 
> Rather it adds a reserved region accounting for the range above the
> device's dma_mask.

Yep. It adds new reserved regions in
/sys/kernel/iommu_groups//reserved_regions and remove those from the
usable regions exposed by VFIO GET_INFO.

  I don't think the IOMMU API should be consuming
> dma_mask like this though.  For example, what happens in
> pci_dma_configure() when there are no OF or ACPI DMA restrictions?
My guess was that the dma_mask was set to the max range but I did not
test it.
  It
> appears to me that the dma_mask from whatever previous driver had the
> device carries over to the new driver.  That's generally ok for the DMA
> API because a driver is required to set the device's DMA mask.  It
> doesn't make sense however to blindly consume that dma_mask and export
> it via an IOMMU API.  For example I would expect to see different
> results depending on whether a host driver has been bound to a device.
> It seems the correct IOMMU API 

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Auger Eric
Hi,

Adding Al in the loop

On 9/24/20 11:38 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
>> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
>>> OK so this looks good. Can you pls repost with the minor tweak
>>> suggested and all acks included, and I will queue this?
>>
>> My NACK still stands, as long as a few questions are open:
>>
>>  1) The format used here will be the same as in the ACPI table? I
>> think the answer to this questions must be Yes, so this leads
>> to the real question:
> 
> I am not sure it's a must.
> We can always tweak the parser if there are slight differences
> between ACPI and virtio formats.
> 
> But we do want the virtio format used here to be approved by the virtio
> TC, so it won't change.
> 
> Eric, Jean-Philippe, does one of you intend to create a github issue
> and request a ballot for the TC? It's been posted end of August with no
> changes ...
Jean-Philippe, would you?
> 
>>  2) Has the ACPI table format stabalized already? If and only if
>> the answer is Yes I will Ack these patches. We don't need to
>> wait until the ACPI table format is published in a
>> specification update, but at least some certainty that it
>> will not change in incompatible ways anymore is needed.
>>

Al, do you have any news about the the VIOT definition submission to
the UEFI ASWG?

Thank you in advance

Best Regards

Eric


> 
> Not that I know, but I don't see why it's a must.
> 
>> So what progress has been made with the ACPI table specification, is it
>> just a matter of time to get it approved or are there concerns?
>>
>> Regards,
>>
>>  Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-09-23 Thread Auger Eric
Hi Zenghui,

On 9/23/20 1:27 PM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> From: "Liu, Yi L" 
>>
>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>> which aims to pass the virtual iommu guest configuration
>> to the host. This latter takes the form of the so-called
>> PASID table.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>> b/drivers/vfio/vfio_iommu_type1.c
>> index a177bf2c6683..bfacbd876ee1 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -2172,6 +2172,43 @@ static int vfio_iommu_iova_build_caps(struct
>> vfio_iommu *iommu,
>>   return ret;
>>   }
>>   +static void
>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>> +{
>> +    struct vfio_domain *d;
>> +
>> +    mutex_lock(>lock);
>> +
>> +    list_for_each_entry(d, >domain_list, next) {
>> +    iommu_detach_pasid_table(d->domain);
>> +    }
>> +    mutex_unlock(>lock);
>> +}
>> +
>> +static int
>> +vfio_attach_pasid_table(struct vfio_iommu *iommu,
>> +    struct vfio_iommu_type1_set_pasid_table *ustruct)
>> +{
>> +    struct vfio_domain *d;
>> +    int ret = 0;
>> +
>> +    mutex_lock(>lock);
>> +
>> +    list_for_each_entry(d, >domain_list, next) {
>> +    ret = iommu_attach_pasid_table(d->domain, >config);
>> +    if (ret)
>> +    goto unwind;
>> +    }
>> +    goto unlock;
>> +unwind:
>> +    list_for_each_entry_continue_reverse(d, >domain_list, next) {
>> +    iommu_detach_pasid_table(d->domain);
>> +    }
>> +unlock:
>> +    mutex_unlock(>lock);
>> +    return ret;
>> +}
>> +
>>   static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  unsigned int cmd, unsigned long arg)
>>   {
>> @@ -2276,6 +2313,25 @@ static long vfio_iommu_type1_ioctl(void
>> *iommu_data,
>>     return copy_to_user((void __user *)arg, , minsz) ?
>>   -EFAULT : 0;
>> +    } else if (cmd == VFIO_IOMMU_SET_PASID_TABLE) {
>> +    struct vfio_iommu_type1_set_pasid_table ustruct;
>> +
>> +    minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table,
>> +    config);
>> +
>> +    if (copy_from_user(, (void __user *)arg, minsz))
>> +    return -EFAULT;
>> +
>> +    if (ustruct.argsz < minsz)
>> +    return -EINVAL;
>> +
>> +    if (ustruct.flags & VFIO_PASID_TABLE_FLAG_SET)
>> +    return vfio_attach_pasid_table(iommu, );
>> +    else if (ustruct.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
>> +    vfio_detach_pasid_table(iommu);
>> +    return 0;
>> +    } else
>> +    return -EINVAL;
> 
> Nit:
> 
> What if user-space blindly set both flags? Should we check that only one
> flag is allowed to be set at this stage, and return error otherwise?
Indeed I can check that.
> 
> Besides, before going through the whole series [1][2], I'd like to know
> if this is the latest version of your Nested-Stage-Setup work in case I
> had missed something.
> 
> [1] https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
> [2] https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com

yes those 2 series are the last ones. Thank you for reviewing.

FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10
0/7] IOMMU user API enhancement. But functionally there won't a lot of
changes.

Thanks

Eric
> 
> 
> Thanks,
> Zenghui
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/arm: Add module parameter to set msi iova address

2020-09-23 Thread Auger Eric
Hi Will,

On 9/21/20 10:45 PM, Will Deacon wrote:
> On Mon, Sep 14, 2020 at 11:13:07AM -0700, Vennila Megavannan wrote:
>> From: Srinath Mannam 
>>
>> Add provision to change default value of MSI IOVA base to platform's
>> suitable IOVA using module parameter. The present hardcoded MSI IOVA base
>> may not be the accessible IOVA ranges of platform.
>>
>> If any platform has the limitaion to access default MSI IOVA, then it can
>> be changed using "arm-smmu.msi_iova_base=0xa000" command line argument.
>>
>> Signed-off-by: Srinath Mannam 
>> Co-developed-by: Vennila Megavannan 
>> Signed-off-by: Vennila Megavannan 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 -
>>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 5 -
>>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> This feels pretty fragile. Wouldn't it be better to realise that there's
> a region conflict with iommu_dma_get_resv_regions() and move the MSI window
> accordingly at runtime?

Since cd2c9fcf5c66 ("iommu/dma: Move PCI window region reservation back
into dma specific path"), the PCI host bridge windows are not exposed by
iommu_dma_get_resv_regions() anymore. If I understood correctly, what is
attempted here is to avoid the collision between such PCI host bridge
window and the MSI IOVA range.

Thanks

Eric
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Auger Eric
Hi,
On 9/16/20 6:32 PM, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote:
>> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
>>> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
 And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
 the PASID space of a PCI function cannot be shared between host and guest,
 so we assign the whole PASID table along with the RID. Since we need the
 BIND, INVALIDATE, and report APIs introduced here to support nested
 translation, a /dev/sva interface would need to support this mode as well.
>>>
>>> Well, that means this HW cannot support PASID capable 'SIOV' style
>>> devices in guests.
>>
>> It does not yet support Intel SIOV, no. It does support the standards,
>> though: PCI SR-IOV to partition a device and PASIDs in a guest.
> 
> SIOV is basically standards based, it is better thought of as a
> cookbook on how to use PASID and IOMMU together.
> 
>>> I admit whole function PASID delegation might be something vfio-pci
>>> should handle - but only if it really doesn't fit in some /dev/sva
>>> after we cover the other PASID cases.
>>
>> Wouldn't that be the duplication you're trying to avoid?  A second
>> channel for bind, invalidate, capability and fault reporting
>> mechanisms?
> 
> Yes, which is why it seems like it would be nicer to avoid it. Why I
> said "might" :)
> 
>> If we extract SVA parts of vfio_iommu_type1 into a separate chardev,
>> PASID table pass-through [1] will have to use that.
> 
> Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI
> entry point for controlling the vIOMMU related to PASID.
> 
> Does anything in the [1] series have tight coupling to VFIO other than
> needing to know a bus/device/function? It looks like it is mostly
> exposing iommu_* functions as uAPI?

this series does not use any PASID so it fits quite nicely into the VFIO
framework I think. Besides cache invalidation that takes the struct
device, other operations (MSI binding and PASID table passing operate on
the iommu domain). Also we use the VFIO memory region and
interrupt/eventfd registration mechanism to return faults.

Thanks

Eric
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/9] iommu/ioasid: Introduce ioasid_set private ID

2020-09-10 Thread Auger Eric
Hi Jacob,

On 9/9/20 12:40 AM, Jacob Pan wrote:
> On Tue, 1 Sep 2020 17:38:44 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>> On 8/22/20 6:35 AM, Jacob Pan wrote:
>>> When an IOASID set is used for guest SVA, each VM will acquire its
>>> ioasid_set for IOASID allocations. IOASIDs within the VM must have a
>>> host/physical IOASID backing, mapping between guest and host
>>> IOASIDs can be non-identical. IOASID set private ID (SPID) is
>>> introduced in this patch to be used as guest IOASID. However, the
>>> concept of ioasid_set specific namespace is generic, thus named
>>> SPID.
>>>
>>> As SPID namespace is within the IOASID set, the IOASID core can
>>> provide lookup services at both directions. SPIDs may not be
>>> allocated when its IOASID is allocated, the mapping between SPID
>>> and IOASID is usually established when a guest page table is bound
>>> to a host PASID.
>>>
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  drivers/iommu/ioasid.c | 54
>>> ++
>>> include/linux/ioasid.h | 12 +++ 2 files changed, 66
>>> insertions(+)
>>>
>>> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
>>> index 5f31d63c75b1..c0aef38a4fde 100644
>>> --- a/drivers/iommu/ioasid.c
>>> +++ b/drivers/iommu/ioasid.c
>>> @@ -21,6 +21,7 @@ enum ioasid_state {
>>>   * struct ioasid_data - Meta data about ioasid
>>>   *
>>>   * @id:Unique ID
>>> + * @spid:  Private ID unique within a set
>>>   * @users  Number of active users
>>>   * @state  Track state of the IOASID
>>>   * @setMeta data of the set this IOASID belongs to
>>> @@ -29,6 +30,7 @@ enum ioasid_state {
>>>   */
>>>  struct ioasid_data {
>>> ioasid_t id;
>>> +   ioasid_t spid;
>>> struct ioasid_set *set;
>>> refcount_t users;
>>> enum ioasid_state state;
>>> @@ -326,6 +328,58 @@ int ioasid_attach_data(ioasid_t ioasid, void
>>> *data) EXPORT_SYMBOL_GPL(ioasid_attach_data);
>>>  
>>>  /**
>>> + * ioasid_attach_spid - Attach ioasid_set private ID to an IOASID
>>> + *
>>> + * @ioasid: the ID to attach
>>> + * @spid:   the ioasid_set private ID of @ioasid
>>> + *
>>> + * For IOASID that is already allocated, private ID within the set
>>> can be
>>> + * attached via this API. Future lookup can be done via
>>> ioasid_find.  
>> I would remove "For IOASID that is already allocated, private ID
>> within the set can be attached via this API"
> I guess it is implied. Will remove.
> 
>>> + */
>>> +int ioasid_attach_spid(ioasid_t ioasid, ioasid_t spid)
>>> +{
>>> +   struct ioasid_data *ioasid_data;
>>> +   int ret = 0;
>>> +
>>> +   spin_lock(_allocator_lock);  
>> We keep on saying the SPID is local to an IOASID set but we don't
>> check any IOASID set contains this ioasid. It looks a bit weird to me.
> We store ioasid_set inside ioasid_data when an IOASID is allocated, so
> we don't need to search all the ioasid_sets. Perhaps I missed your
> point?
No I think it became clearer ;-)
> 
>>> +   ioasid_data = xa_load(_allocator->xa, ioasid);
>>> +
>>> +   if (!ioasid_data) {
>>> +   pr_err("No IOASID entry %d to attach SPID %d\n",
>>> +   ioasid, spid);
>>> +   ret = -ENOENT;
>>> +   goto done_unlock;
>>> +   }
>>> +   ioasid_data->spid = spid;  
>> is there any way/need to remove an spid binding?
> For guest SVA, we attach SPID as a guest PASID during bind guest page
> table. Unbind does the opposite, ioasid_attach_spid() with
> spid=INVALID_IOASID clears the binding.
> 
> Perhaps add more symmetric functions? i.e.
> ioasid_detach_spid(ioasid_t ioasid)
> ioasid_attach_spid(struct ioasid_set *set, ioasid_t ioasid)
yep make sense

Thanks

Eric
> 
>>> +
>>> +done_unlock:
>>> +   spin_unlock(_allocator_lock);
>>> +   return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ioasid_attach_spid);
>>> +
>>> +ioasid_t ioasid_find_by_spid(struct ioasid_set *set, ioasid_t spid)
>>> +{
>>> +   struct ioasid_data *entry;
>>> +   unsigned long index;
>>> +
>>> +   if (!xa_load(_sets, set->sid)) {
>>> +   pr_warn("Invalid set\n");
>>> +   return INVALID_IOASID;
>&g

Re: [PATCH v2 6/9] iommu/ioasid: Introduce notification APIs

2020-09-10 Thread Auger Eric
Hi Jacob,

On 9/10/20 12:58 AM, Jacob Pan wrote:
> On Tue, 1 Sep 2020 18:49:38 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 8/22/20 6:35 AM, Jacob Pan wrote:
>>> Relations among IOASID users largely follow a publisher-subscriber
>>> pattern. E.g. to support guest SVA on Intel Scalable I/O
>>> Virtualization (SIOV) enabled platforms, VFIO, IOMMU, device
>>> drivers, KVM are all users of IOASIDs. When a state change occurs,
>>> VFIO publishes the change event that needs to be processed by other
>>> users/subscribers.
>>>
>>> This patch introduced two types of notifications: global and per
>>> ioasid_set. The latter is intended for users who only needs to
>>> handle events related to the IOASID of a given set.
>>> For more information, refer to the kernel documentation at
>>> Documentation/ioasid.rst.
>>>
>>> Signed-off-by: Liu Yi L 
>>> Signed-off-by: Wu Hao 
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  drivers/iommu/ioasid.c | 280
>>> -
>>> include/linux/ioasid.h |  70 + 2 files changed, 348
>>> insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
>>> index c0aef38a4fde..6ddc09a7fe74 100644
>>> --- a/drivers/iommu/ioasid.c
>>> +++ b/drivers/iommu/ioasid.c
>>> @@ -9,8 +9,35 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  static DEFINE_XARRAY_ALLOC(ioasid_sets);
>>> +/*
>>> + * An IOASID could have multiple consumers where each consumeer
>>> may have  
>> can have multiple consumers
> Sounds good, I used past tense to describe a possibility :)
> 
>>> + * hardware contexts associated with IOASIDs.
>>> + * When a status change occurs, such as IOASID is being freed,
>>> notifier chains  
>> s/such as IOASID is being freed/, like on IOASID deallocation,
> Better, will do.
> 
>>> + * are used to keep the consumers in sync.
>>> + * This is a publisher-subscriber pattern where publisher can
>>> change the
>>> + * state of each IOASID, e.g. alloc/free, bind IOASID to a device
>>> and mm.
>>> + * On the other hand, subscribers gets notified for the state
>>> change and
>>> + * keep local states in sync.
>>> + *
>>> + * Currently, the notifier is global. A further optimization could
>>> be per
>>> + * IOASID set notifier chain.
>>> + */
>>> +static ATOMIC_NOTIFIER_HEAD(ioasid_chain);
>>> +
>>> +/* List to hold pending notification block registrations */
>>> +static LIST_HEAD(ioasid_nb_pending_list);
>>> +static DEFINE_SPINLOCK(ioasid_nb_lock);
>>> +struct ioasid_set_nb {
>>> +   struct list_headlist;
>>> +   struct notifier_block   *nb;
>>> +   void*token;
>>> +   struct ioasid_set   *set;
>>> +   boolactive;
>>> +};
>>> +
>>>  enum ioasid_state {
>>> IOASID_STATE_INACTIVE,
>>> IOASID_STATE_ACTIVE,
>>> @@ -394,6 +421,7 @@ EXPORT_SYMBOL_GPL(ioasid_find_by_spid);
>>>  ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
>>> ioasid_t max, void *private)
>>>  {
>>> +   struct ioasid_nb_args args;
>>> struct ioasid_data *data;
>>> void *adata;
>>> ioasid_t id = INVALID_IOASID;
>>> @@ -445,8 +473,14 @@ ioasid_t ioasid_alloc(struct ioasid_set *set,
>>> ioasid_t min, ioasid_t max, goto exit_free;
>>> }
>>> set->nr_ioasids++;
>>> -   goto done_unlock;
>>> +   args.id = id;
>>> +   /* Set private ID is not attached during allocation */
>>> +   args.spid = INVALID_IOASID;
>>> +   args.set = set;
>>> +   atomic_notifier_call_chain(>nh, IOASID_ALLOC, );
>>>  
>>> +   spin_unlock(_allocator_lock);
>>> +   return id;  
>> spurious change
> Good catch. should just goto done_unlock.
> 
>>>  exit_free:
>>> kfree(data);
>>>  done_unlock:
>>> @@ -479,6 +513,7 @@ static void ioasid_do_free(struct ioasid_data
>>> *data) 
>>>  static void ioasid_free_locked(struct ioasid_set *set, ioasid_t
>>> ioasid) {
>>> +   struct ioasid_nb_args args;
>>> struct ioasid_data *data;
>>>  
>>> data = xa_load(_allocator->xa, ioasid);
>>> @@ -491,7 +526,

Re: [PATCH RESEND v9 02/13] iommu/ioasid: Add ioasid references

2020-09-08 Thread Auger Eric
Hi Jean,

On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> Let IOASID users take references to existing ioasids with ioasid_get().
> ioasid_put() drops a reference and only frees the ioasid when its
> reference number is zero. It returns true if the ioasid was freed.
> For drivers that don't call ioasid_get(), ioasid_put() is the same as
> ioasid_free().
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/linux/ioasid.h  | 10 --
>  drivers/iommu/intel/iommu.c |  4 ++--
>  drivers/iommu/intel/svm.c   |  6 +++---
>  drivers/iommu/ioasid.c  | 38 +
>  4 files changed, 47 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 6f000d7a0ddc..e9dacd4b9f6b 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -34,7 +34,8 @@ struct ioasid_allocator_ops {
>  #if IS_ENABLED(CONFIG_IOASID)
>  ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> void *private);
> -void ioasid_free(ioasid_t ioasid);
> +void ioasid_get(ioasid_t ioasid);
> +bool ioasid_put(ioasid_t ioasid);
>  void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *));
>  int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> @@ -48,10 +49,15 @@ static inline ioasid_t ioasid_alloc(struct ioasid_set 
> *set, ioasid_t min,
>   return INVALID_IOASID;
>  }
>  
> -static inline void ioasid_free(ioasid_t ioasid)
> +static inline void ioasid_get(ioasid_t ioasid)
>  {
>  }
>  
> +static inline bool ioasid_put(ioasid_t ioasid)
> +{
> + return false;
> +}
> +
>  static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
>   bool (*getter)(void *))
>  {
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index e9864e52b0e9..152fc2dc17e0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5149,7 +5149,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
> *domain,
>   domain->auxd_refcnt--;
>  
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - ioasid_free(domain->default_pasid);
> + ioasid_put(domain->default_pasid);
>  }
>  
>  static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5210,7 +5210,7 @@ static int aux_domain_add_dev(struct dmar_domain 
> *domain,
>   spin_unlock(>lock);
>   spin_unlock_irqrestore(_domain_lock, flags);
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - ioasid_free(domain->default_pasid);
> + ioasid_put(domain->default_pasid);
>  
>   return ret;
>  }
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 95c3164a2302..50897a2bd1da 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -565,7 +565,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct 
> svm_dev_ops *ops,
>   if (mm) {
>   ret = mmu_notifier_register(>notifier, mm);
>   if (ret) {
> - ioasid_free(svm->pasid);
> + ioasid_put(svm->pasid);
>   kfree(svm);
>   kfree(sdev);
>   goto out;
> @@ -583,7 +583,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct 
> svm_dev_ops *ops,
>   if (ret) {
>   if (mm)
>   mmu_notifier_unregister(>notifier, mm);
> - ioasid_free(svm->pasid);
> + ioasid_put(svm->pasid);
>   kfree(svm);
>   kfree(sdev);
>   goto out;
> @@ -652,7 +652,7 @@ static int intel_svm_unbind_mm(struct device *dev, int 
> pasid)
>   kfree_rcu(sdev, rcu);
>  
>   if (list_empty(>devs)) {
> - ioasid_free(svm->pasid);
> + ioasid_put(svm->pasid);
>   if (svm->mm)
>   mmu_notifier_unregister(>notifier, 
> svm->mm);
>   list_del(>list);
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 0f8dd377aada..50ee27bbd04e 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -2,7 +2,7 @@
>  /*
>   * I/O Address Space ID allocator. There is one global IOASID space, split 
> into
>   * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> - * free IOASIDs with ioasid_alloc and ioasid_free.
> + * free IOASIDs with ioasid_alloc and ioasid_put.
>   */
>  #include 
>  #include 
> @@ -15,6 +15,7 @@ struct ioasid_data {
>   struct ioasid_set *set;
>   void *private;
>   struct rcu_head rcu;
> + refcount_t refs;
>  };
> 

Re: [PATCH RESEND v9 11/13] iommu/arm-smmu-v3: Add SVA device feature

2020-09-08 Thread Auger Eric
3,16 @@ static int arm_smmu_attach_dev(struct iommu_domain 
> *domain, struct device *dev)
>   master = dev_iommu_priv_get(dev);
>   smmu = master->smmu;
>  
> + /*
> +  * Checking that SVA is disabled ensures that this device isn't bound to
> +  * any mm, and can be safely detached from its old domain. Bonds cannot
> +  * be removed concurrently since we're holding the group mutex.
> +  */
> + if (arm_smmu_master_sva_enabled(master)) {
> + dev_err(dev, "cannot attach - SVA enabled\n");
> + return -EBUSY;
> + }
> +
>   arm_smmu_detach_dev(master);
>  
>   mutex_lock(_domain->init_mutex);
> @@ -2310,6 +2320,7 @@ static struct iommu_device 
> *arm_smmu_probe_device(struct device *dev)
>   master->smmu = smmu;
>   master->sids = fwspec->ids;
>   master->num_sids = fwspec->num_ids;
> + INIT_LIST_HEAD(>bonds);
>   dev_iommu_priv_set(dev, master);
>  
>   /* Check the SIDs are in range of the SMMU and our stream table */
> @@ -2362,6 +2373,7 @@ static void arm_smmu_release_device(struct device *dev)
>   return;
>  
>   master = dev_iommu_priv_get(dev);
> + WARN_ON(arm_smmu_master_sva_enabled(master));
>   arm_smmu_detach_dev(master);
>   arm_smmu_disable_pasid(master);
>   kfree(master);
> @@ -2479,6 +2491,69 @@ static void arm_smmu_get_resv_regions(struct device 
> *dev,
>   iommu_dma_get_resv_regions(dev, head);
>  }
>  
> +static bool arm_smmu_dev_has_feature(struct device *dev,
> +  enum iommu_dev_features feat)
> +{
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> + if (!master)
> + return false;
> +
> + switch (feat) {
> + case IOMMU_DEV_FEAT_SVA:
> + return arm_smmu_master_sva_supported(master);
> + default:
> + return false;
> + }
> +}
> +
> +static bool arm_smmu_dev_feature_enabled(struct device *dev,
> +  enum iommu_dev_features feat)
> +{
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> + if (!master)
> + return false;
> +
> + switch (feat) {
> + case IOMMU_DEV_FEAT_SVA:
> + return arm_smmu_master_sva_enabled(master);
> + default:
> + return false;
> + }
> +}
> +
> +static int arm_smmu_dev_enable_feature(struct device *dev,
> +enum iommu_dev_features feat)
> +{
> + if (!arm_smmu_dev_has_feature(dev, feat))
> + return -ENODEV;
> +
> + if (arm_smmu_dev_feature_enabled(dev, feat))
> + return -EBUSY;
> +
> + switch (feat) {
> + case IOMMU_DEV_FEAT_SVA:
> + return arm_smmu_master_enable_sva(dev_iommu_priv_get(dev));
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int arm_smmu_dev_disable_feature(struct device *dev,
> + enum iommu_dev_features feat)
> +{
> + if (!arm_smmu_dev_feature_enabled(dev, feat))
> + return -EINVAL;
> +
> + switch (feat) {
> + case IOMMU_DEV_FEAT_SVA:
> + return arm_smmu_master_disable_sva(dev_iommu_priv_get(dev));
> + default:
> + return -EINVAL;
> + }
> +}
> +
>  static struct iommu_ops arm_smmu_ops = {
>   .capable= arm_smmu_capable,
>   .domain_alloc   = arm_smmu_domain_alloc,
> @@ -2497,6 +2572,10 @@ static struct iommu_ops arm_smmu_ops = {
>   .of_xlate   = arm_smmu_of_xlate,
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
> + .dev_has_feat   = arm_smmu_dev_has_feature,
> + .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
> + .dev_enable_feat= arm_smmu_dev_enable_feature,
> + .dev_disable_feat   = arm_smmu_dev_disable_feature,
>   .pgsize_bitmap  = -1UL, /* Restricted during device attach */
>  };
>  
> 
Besides

Reviewed-by: Eric Auger 

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v9 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-09-08 Thread Auger Eric
Hi Jean,
On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> Aggregate all sanity-checks for sharing CPU page tables with the SMMU
> under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
> check FEAT_ATS and FEAT_PRI. For platform SVA, they will have to check
> FEAT_STALLS.
> 
> Introduce ARM_SMMU_FEAT_BTM (Broadcast TLB Maintenance), but don't
> enable it at the moment. Since the entire VMID space is shared with the
> CPU, enabling DVM (by clearing SMMU_CR2.PTM) could result in
> over-invalidation and affect performance of stage-2 mappings.
In which series do you plan to enable it?
> 
> Cc: Suzuki K Poulose 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 10 +
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 43 +++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 ++
>  3 files changed, 56 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 90c08f156b43..7b14b48a26c7 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -602,6 +602,8 @@ struct arm_smmu_device {
>  #define ARM_SMMU_FEAT_STALL_FORCE(1 << 13)
>  #define ARM_SMMU_FEAT_VAX(1 << 14)
>  #define ARM_SMMU_FEAT_RANGE_INV  (1 << 15)
> +#define ARM_SMMU_FEAT_BTM(1 << 16)
> +#define ARM_SMMU_FEAT_SVA(1 << 17)
>   u32 features;
>  
>  #define ARM_SMMU_OPT_SKIP_PREFETCH   (1 << 0)
> @@ -683,4 +685,12 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
> *smmu_domain, int ssid,
>  void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
>  bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
>  
> +#ifdef CONFIG_ARM_SMMU_V3_SVA
> +bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
> +#else /* CONFIG_ARM_SMMU_V3_SVA */
> +static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> +{
> + return false;
> +}
> +#endif /* CONFIG_ARM_SMMU_V3_SVA */
>  #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index e919ce894dd1..bf81d91ce71e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -153,3 +153,46 @@ static void arm_smmu_free_shared_cd(struct 
> arm_smmu_ctx_desc *cd)
>   kfree(cd);
>   }
>  }
> +
> +bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> +{
> + unsigned long reg, fld;
> + unsigned long oas;
> + unsigned long asid_bits;
> +
> + u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
> +
> + if ((smmu->features & feat_mask) != feat_mask)
> + return false;
> +
> + if (!(smmu->pgsize_bitmap & PAGE_SIZE))
> + return false;
If we were to check VA_BITS versus SMMU capabilities I guess this would
be here?
> +
> + /*
> +  * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
> +  * not even pretending to support AArch32 here. Abort if the MMU outputs
> +  * addresses larger than what we support.
> +  */
> + reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> + fld = cpuid_feature_extract_unsigned_field(reg, 
> ID_AA64MMFR0_PARANGE_SHIFT);
> + oas = id_aa64mmfr0_parange_to_phys_shift(fld);
> + if (smmu->oas < oas)
> + return false;
> +
> + /* We can support bigger ASIDs than the CPU, but not smaller */
> + fld = cpuid_feature_extract_unsigned_field(reg, 
> ID_AA64MMFR0_ASID_SHIFT);
> + asid_bits = fld ? 16 : 8;
> + if (smmu->asid_bits < asid_bits)
> + return false;
> +
> + /*
> +  * See max_pinned_asids in arch/arm64/mm/context.c. The following is
> +  * generally the maximum number of bindable processes.
> +  */
> + if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
Out of curiosity, What is the rationale behind using
arm64_kernel_unmapped_at_el0() versus
IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)?
CPU caps being finalized? Is that why you say "generally" here?
> + asid_bits--;
> + dev_dbg(smmu->dev, "%d shared contexts\n", (1 << asid_bits) -> +
> num_possible_cpus() - 2);
nit: s/shared/bindable?
> +
> + return true;
> +}
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 9e755caea525..15cb3d9c1a5d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3258,6 +3258,9 @@ static int arm_smmu_device_hw_probe(struct 
> arm_smmu_device *smmu)
>  
>   smmu->ias = max(smmu->ias, smmu->oas);
>  
> + if (arm_smmu_sva_supported(smmu))
> + smmu->features |= ARM_SMMU_FEAT_SVA;
> +
>   dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
>smmu->ias, 

Re: [PATCH RESEND v9 07/13] iommu/arm-smmu-v3: Move definitions to a header

2020-09-08 Thread Auger Eric
Hi Jean,

On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> Allow sharing structure definitions with the upcoming SVA support for
> Arm SMMUv3, by moving them to a separate header. We could surgically
> extract only what is needed but keeping all definitions in one place
> looks nicer.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 675 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 660 +--
>  2 files changed, 676 insertions(+), 659 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> new file mode 100644
> index ..51a9ce07b2d6
> --- /dev/null
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -0,0 +1,675 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * IOMMU API for ARM architected SMMUv3 implementations.
> + *
> + * Copyright (C) 2015 ARM Limited
> + */
> +
> +#ifndef _ARM_SMMU_V3_H
> +#define _ARM_SMMU_V3_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* MMIO registers */
> +#define ARM_SMMU_IDR00x0
> +#define IDR0_ST_LVL  GENMASK(28, 27)
> +#define IDR0_ST_LVL_2LVL 1
> +#define IDR0_STALL_MODEL GENMASK(25, 24)
> +#define IDR0_STALL_MODEL_STALL   0
> +#define IDR0_STALL_MODEL_FORCE   2
> +#define IDR0_TTENDIANGENMASK(22, 21)
> +#define IDR0_TTENDIAN_MIXED  0
> +#define IDR0_TTENDIAN_LE 2
> +#define IDR0_TTENDIAN_BE 3
> +#define IDR0_CD2L(1 << 19)
> +#define IDR0_VMID16  (1 << 18)
> +#define IDR0_PRI (1 << 16)
> +#define IDR0_SEV (1 << 14)
> +#define IDR0_MSI (1 << 13)
> +#define IDR0_ASID16  (1 << 12)
> +#define IDR0_ATS (1 << 10)
> +#define IDR0_HYP (1 << 9)
> +#define IDR0_COHACC  (1 << 4)
> +#define IDR0_TTF GENMASK(3, 2)
> +#define IDR0_TTF_AARCH64 2
> +#define IDR0_TTF_AARCH32_64  3
> +#define IDR0_S1P (1 << 1)
> +#define IDR0_S2P (1 << 0)
> +
> +#define ARM_SMMU_IDR10x4
> +#define IDR1_TABLES_PRESET   (1 << 30)
> +#define IDR1_QUEUES_PRESET   (1 << 29)
> +#define IDR1_REL (1 << 28)
> +#define IDR1_CMDQS   GENMASK(25, 21)
> +#define IDR1_EVTQS   GENMASK(20, 16)
> +#define IDR1_PRIQS   GENMASK(15, 11)
> +#define IDR1_SSIDSIZEGENMASK(10, 6)
> +#define IDR1_SIDSIZE GENMASK(5, 0)
> +
> +#define ARM_SMMU_IDR30xc
> +#define IDR3_RIL (1 << 10)
> +
> +#define ARM_SMMU_IDR50x14
> +#define IDR5_STALL_MAX   GENMASK(31, 16)
> +#define IDR5_GRAN64K (1 << 6)
> +#define IDR5_GRAN16K (1 << 5)
> +#define IDR5_GRAN4K  (1 << 4)
> +#define IDR5_OAS GENMASK(2, 0)
> +#define IDR5_OAS_32_BIT  0
> +#define IDR5_OAS_36_BIT  1
> +#define IDR5_OAS_40_BIT  2
> +#define IDR5_OAS_42_BIT  3
> +#define IDR5_OAS_44_BIT  4
> +#define IDR5_OAS_48_BIT  5
> +#define IDR5_OAS_52_BIT  6
> +#define IDR5_VAX GENMASK(11, 10)
> +#define IDR5_VAX_52_BIT  1
> +
> +#define ARM_SMMU_CR0 0x20
> +#define CR0_ATSCHK   (1 << 4)
> +#define CR0_CMDQEN   (1 << 3)
> +#define CR0_EVTQEN   (1 << 2)
> +#define CR0_PRIQEN   (1 << 1)
> +#define CR0_SMMUEN   (1 << 0)
> +
> +#define ARM_SMMU_CR0ACK  0x24
> +
> +#define ARM_SMMU_CR1 0x28
> +#define CR1_TABLE_SH GENMASK(11, 10)
> +#define CR1_TABLE_OC GENMASK(9, 8)
> +#define CR1_TABLE_IC GENMASK(7, 6)
> +#define CR1_QUEUE_SH GENMASK(5, 4)
> +#define CR1_QUEUE_OC GENMASK(3, 2)
> +#define CR1_QUEUE_IC GENMASK(1, 0)
> +/* CR1 cacheability fields don't quite follow the usual TCR-style encoding */
> +#define CR1_CACHE_NC 0
> +#define CR1_CACHE_WB 1
> +#define CR1_CACHE_WT 2
> +
> +#define ARM_SMMU_CR2 0x2c
> +#define CR2_PTM  (1 << 2)
> +#define CR2_RECINVSID(1 << 1)
> +#define CR2_E2H  (1 << 0)
> +
> +#define ARM_SMMU_GBPA0x44
> +#define GBPA_UPDATE  (1 << 31)
> 

Re: [PATCH RESEND v9 03/13] iommu/sva: Add PASID helpers

2020-09-08 Thread Auger Eric
Hi Jean,

On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> Let IOMMU drivers allocate a single PASID per mm. Store the mm in the
> IOASID set to allow refcounting and searching mm by PASID, when handling
> an I/O page fault.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig |  5 +++
>  drivers/iommu/Makefile|  1 +
>  drivers/iommu/iommu-sva-lib.h | 15 +++
>  drivers/iommu/iommu-sva-lib.c | 85 +++
>  4 files changed, 106 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva-lib.h
>  create mode 100644 drivers/iommu/iommu-sva-lib.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index bef5d75e306b..fb1787377eb6 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -103,6 +103,11 @@ config IOMMU_DMA
>   select IRQ_MSI_IOMMU
>   select NEED_SG_DMA_LENGTH
>  
> +# Shared Virtual Addressing library
> +config IOMMU_SVA_LIB
> + bool
> + select IOASID
> +
>  config FSL_PAMU
>   bool "Freescale IOMMU support"
>   depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 11f1771104f3..61bd30cd8369 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -27,3 +27,4 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> new file mode 100644
> index ..b40990aef3fd
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * SVA library for IOMMU drivers
> + */
> +#ifndef _IOMMU_SVA_LIB_H
> +#define _IOMMU_SVA_LIB_H
> +
> +#include 
> +#include 
> +
> +int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
> +void iommu_sva_free_pasid(struct mm_struct *mm);
> +struct mm_struct *iommu_sva_find(ioasid_t pasid);
> +
> +#endif /* _IOMMU_SVA_LIB_H */
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> new file mode 100644
> index ..db7e6c104d6b
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Helpers for IOMMU drivers implementing SVA
> + */
> +#include 
> +#include 
> +
> +#include "iommu-sva-lib.h"
> +
> +static DEFINE_MUTEX(iommu_sva_lock);
> +static DECLARE_IOASID_SET(iommu_sva_pasid);
> +
> +/**
> + * iommu_sva_alloc_pasid - Allocate a PASID for the mm
> + * @mm: the mm
> + * @min: minimum PASID value (inclusive)
> + * @max: maximum PASID value (inclusive)
> + *
> + * Try to allocate a PASID for this mm, or take a reference to the existing 
> one
> + * provided it fits within the [min, max] range. On success the PASID is
> + * available in mm->pasid, and must be released with iommu_sva_free_pasid().
> + *
> + * Returns 0 on success and < 0 on error.
> + */
> +int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> +{
> + int ret = 0;
> + ioasid_t pasid;
> +
> + if (min == INVALID_IOASID || max == INVALID_IOASID ||
> + min == 0 || max < min)
you may add a comment explaining why min == 0 is forbidden.
> + return -EINVAL;
> +
> + mutex_lock(_sva_lock);
> + if (mm->pasid) {
> + if (mm->pasid >= min && mm->pasid <= max)
> + ioasid_get(mm->pasid);
> + else
> + ret = -EOVERFLOW;
> + } else {
> + pasid = ioasid_alloc(_sva_pasid, min, max, mm);
> + if (pasid == INVALID_IOASID)
> + ret = -ENOMEM;
> + else
> + mm->pasid = pasid;
> + }
> + mutex_unlock(_sva_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
> +
> +/**
> + * iommu_sva_free_pasid - Release the mm's PASID
> + * @mm: the mm.
> + *
> + * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid()
> + */
> +void iommu_sva_free_pasid(struct mm_struct *mm)
> +{
> + mutex_lock(_sva_lock);
> + if (ioasid_put(mm->pasid))
> + mm->pasid = 0;
ditto: 0 versus INVALID_IOASID
> + mutex_unlock(_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
> +
> +/* ioasid wants a void * argument */
shouldn't it be:
ioasid_find getter() requires a void *arg?
> +static bool __mmget_not_zero(void *mm)
> +{
> + return mmget_not_zero(mm);
> +}
> +
> +/**
> + * iommu_sva_find() - Find mm associated to the given PASID
> + * @pasid: Process Address Space ID assigned to the mm
> + *
> + * On success a reference to the mm is taken, and must be released with 
> mmput().
> + *
> + * Returns the mm corresponding to this PASID, or an error if not found.
> + */
> +struct mm_struct *iommu_sva_find(ioasid_t pasid)
> +{
> + return 

Re: [PATCH RESEND v9 08/13] iommu/arm-smmu-v3: Share process page tables

2020-09-08 Thread Auger Eric
Hi Jean,

On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
> MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
> into two sets, shared and private. Shared ASIDs correspond to those
> obtained from the arch ASID allocator, and private ASIDs are used for
> "classic" map/unmap DMA.
> 
> A possible conflict happens when trying to use a shared ASID that has
> already been allocated for private use by the SMMU driver. This will be
> addressed in a later patch by replacing the private ASID. At the
> moment we return -EBUSY.
> 
> Each mm_struct shared with the SMMU will have a single context
> descriptor. Add a refcount to keep track of this. It will be protected
> by the global SVA lock.
> 
> Introduce a new arm-smmu-v3-sva.c file and the CONFIG_ARM_SMMU_V3_SVA
> option to let users opt in SVA support.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
> v9: Move to arm-smmu-v3-sva.c
> ---
>  drivers/iommu/Kconfig |  10 ++
>  drivers/iommu/arm/arm-smmu-v3/Makefile|   5 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 ++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 123 ++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  34 -
>  5 files changed, 172 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index fb1787377eb6..b1d592cd9984 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -313,6 +313,16 @@ config ARM_SMMU_V3
> Say Y here if your system includes an IOMMU device implementing
> the ARM SMMUv3 architecture.
>  
> +config ARM_SMMU_V3_SVA
> + bool "Shared Virtual Addressing support for the ARM SMMUv3"
> + depends on ARM_SMMU_V3
> + help
> +   Support for sharing process address spaces with devices using the
> +   SMMUv3.
> +
> +   Say Y here if your system supports SVA extensions such as PCIe PASID
> +   and PRI.
> +
>  config S390_IOMMU
>   def_bool y if S390 && PCI
>   depends on S390 && PCI
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile 
> b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index 569e24e9f162..54feb1ecccad 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -1,2 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
> +obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
> +arm_smmu_v3-objs-y += arm-smmu-v3.o
> +arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> +arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 51a9ce07b2d6..6b06a6f19604 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -540,6 +540,9 @@ struct arm_smmu_ctx_desc {
>   u64 ttbr;
>   u64 tcr;
>   u64 mair;
> +
> + refcount_t  refs;
> + struct mm_struct*mm;
>  };
>  
>  struct arm_smmu_l1_ctx_desc {
> @@ -672,4 +675,9 @@ struct arm_smmu_domain {
>   spinlock_t  devices_lock;
>  };
>  
> +extern struct xarray arm_smmu_asid_xa;
> +extern struct mutex arm_smmu_asid_lock;
> +
> +bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
> +
>  #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> new file mode 100644
> index ..7a4f40565e06
> --- /dev/null
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -0,0 +1,123 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Implementation of the IOMMU SVA API for the ARM SMMUv3
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "arm-smmu-v3.h"
> +#include "../../io-pgtable-arm.h"
> +
> +static struct arm_smmu_ctx_desc *
> +arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
> +{
> + struct arm_smmu_ctx_desc *cd;
> +
> + cd = xa_load(_smmu_asid_xa, asid);
> + if (!cd)
> + return NULL;
> +
> + if (cd->mm) {
> + if (WARN_ON(cd->mm != mm))
> + return ERR_PTR(-EINVAL);
> + /* All devices bound to this mm use the same cd struct. */
> + refcount_inc(>refs);
> + return cd;
> + }
> +
> + /* Ouch, ASID is already in use for a private cd. */
> + return ERR_PTR(-EBUSY);
> +}
> +
> +__maybe_unused
> +static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct 
> *mm)
> +{
> + u16 asid;
> + int err = 0;
> + u64 tcr, par, reg;
> + struct arm_smmu_ctx_desc *cd;
> + struct arm_smmu_ctx_desc *ret = NULL;
> +
> + asid = arm64_mm_context_get(mm);
> + if (!asid)
> + 

Re: [PATCH RESEND v9 09/13] iommu/arm-smmu-v3: Seize private ASID

2020-09-07 Thread Auger Eric
Hi Jean,

On 8/17/20 7:15 PM, Jean-Philippe Brucker wrote:
> The SMMU has a single ASID space, the union of shared and private ASID
> sets. This means that the SMMU driver competes with the arch allocator
> for ASIDs. Shared ASIDs are those of Linux processes, allocated by the
> arch, and contribute in broadcast TLB maintenance. Private ASIDs are
> allocated by the SMMU driver and used for "classic" map/unmap DMA. They
> require command-queue TLB invalidations.
> 
> When we pin down an mm_context and get an ASID that is already in use by
> the SMMU, it belongs to a private context. We used to simply abort the
> bind, but this is unfair to users that would be unable to bind a few
> seemingly random processes. Try to allocate a new private ASID for the
> context, and make the old ASID shared.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  3 ++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 36 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 34 +++---
>  3 files changed, 58 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 6b06a6f19604..90c08f156b43 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -678,6 +678,9 @@ struct arm_smmu_domain {
>  extern struct xarray arm_smmu_asid_xa;
>  extern struct mutex arm_smmu_asid_lock;
>  
> +int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,
> + struct arm_smmu_ctx_desc *cd);
> +void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
>  bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
>  
>  #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 7a4f40565e06..e919ce894dd1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -10,10 +10,19 @@
>  #include "arm-smmu-v3.h"
>  #include "../../io-pgtable-arm.h"
>  
> +/*
> + * Try to reserve this ASID in the SMMU. If it is in use, try to steal it 
> from
> + * the private entry. Careful here, we may be modifying the context tables of
> + * another SMMU!
Not sure I got what you meant by this comment.
> + */
>  static struct arm_smmu_ctx_desc *
>  arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>  {
> + int ret;
> + u32 new_asid;
>   struct arm_smmu_ctx_desc *cd;
> + struct arm_smmu_device *smmu;
> + struct arm_smmu_domain *smmu_domain;
>  
>   cd = xa_load(_smmu_asid_xa, asid);
>   if (!cd)
> @@ -27,8 +36,31 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>   return cd;
>   }
>  
> - /* Ouch, ASID is already in use for a private cd. */
> - return ERR_PTR(-EBUSY);
> + smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd);
> + smmu = smmu_domain->smmu;
> +
> + ret = xa_alloc(_smmu_asid_xa, _asid, cd,
> +XA_LIMIT(1, 1 << smmu->asid_bits), GFP_KERNEL);
XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL)
> + if (ret)
> + return ERR_PTR(-ENOSPC);
> + /*
> +  * Race with unmap: TLB invalidations will start targeting the new ASID,
> +  * which isn't assigned yet. We'll do an invalidate-all on the old ASID
> +  * later, so it doesn't matter.
> +  */
> + cd->asid = new_asid;
> + /*
> +  * Update ASID and invalidate CD in all associated masters. There will
> +  * be some overlap between use of both ASIDs, until we invalidate the
> +  * TLB.
> +  */
> + arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
> +
> + /* Invalidate TLB entries previously associated with that context */
> + arm_smmu_tlb_inv_asid(smmu, asid);
> +
> + xa_erase(_smmu_asid_xa, asid);
> + return NULL;
>  }
>  
>  __maybe_unused
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 9e81615744de..9e755caea525 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -873,6 +873,17 @@ static int arm_smmu_cmdq_batch_submit(struct 
> arm_smmu_device *smmu,
>  }
>  
>  /* Context descriptor manipulation functions */
> +void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
> +{
> + struct arm_smmu_cmdq_ent cmd = {
> + .opcode = CMDQ_OP_TLBI_NH_ASID,
> + .tlbi.asid = asid,
> + };
> +
> + arm_smmu_cmdq_issue_cmd(smmu, );
> + arm_smmu_cmdq_issue_sync(smmu);
> +}
> +
>  static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
>int ssid, bool leaf)
>  {
> @@ -953,8 +964,8 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain 
> *smmu_domain,
>   return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS;
>  }
>  

Re: [PATCH v2 3/9] iommu/ioasid: Introduce ioasid_set APIs

2020-09-07 Thread Auger Eric
Hi Jacob,

On 9/3/20 11:07 PM, Jacob Pan wrote:
> On Tue, 1 Sep 2020 13:51:26 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 8/22/20 6:35 AM, Jacob Pan wrote:
>>> ioasid_set was introduced as an arbitrary token that are shared by
>>> a  
>> that is
> got it
> 
>>> group of IOASIDs. For example, if IOASID #1 and #2 are allocated
>>> via the same ioasid_set*, they are viewed as to belong to the same
>>> set.  
>> two IOASIDs allocated via the same ioasid_set pointer belong to the
>> same set?
>>>
> yes, better.
> 
>>> For guest SVA usages, system-wide IOASID resources need to be
>>> partitioned such that VMs can have its own quota and being managed  
>> their own
> right,
> 
>>> separately. ioasid_set is the perfect candidate for meeting such
>>> requirements. This patch redefines and extends ioasid_set with the
>>> following new fields:
>>> - Quota
>>> - Reference count
>>> - Storage of its namespace
>>> - The token is stored in the new ioasid_set but with optional types
>>>
>>> ioasid_set level APIs are introduced that wires up these new data.  
>> that wire
> right
> 
>>> Existing users of IOASID APIs are converted where a host IOASID set
>>> is allocated for bare-metal usage.
>>>
>>> Signed-off-by: Liu Yi L 
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  drivers/iommu/intel/iommu.c |  27 ++-
>>>  drivers/iommu/intel/pasid.h |   1 +
>>>  drivers/iommu/intel/svm.c   |   8 +-
>>>  drivers/iommu/ioasid.c  | 390
>>> +---
>>> include/linux/ioasid.h  |  82 -- 5 files changed, 465
>>> insertions(+), 43 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c
>>> b/drivers/iommu/intel/iommu.c index a3a0b5c8921d..5813eeaa5edb
>>> 100644 --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -42,6 +42,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -103,6 +104,9 @@
>>>   */
>>>  #define INTEL_IOMMU_PGSIZES(~0xFFFUL)
>>>  
>>> +/* PASIDs used by host SVM */
>>> +struct ioasid_set *host_pasid_set;
>>> +
>>>  static inline int agaw_to_level(int agaw)
>>>  {
>>> return agaw + 2;
>>> @@ -3103,8 +3107,8 @@ static void intel_vcmd_ioasid_free(ioasid_t
>>> ioasid, void *data)
>>>  * Sanity check the ioasid owner is done at upper layer,
>>> e.g. VFIO
>>>  * We can only free the PASID when all the devices are
>>> unbound. */
>>> -   if (ioasid_find(NULL, ioasid, NULL)) {
>>> -   pr_alert("Cannot free active IOASID %d\n", ioasid);
>>> +   if (IS_ERR(ioasid_find(host_pasid_set, ioasid, NULL))) {
>>> +   pr_err("Cannot free IOASID %d, not in system
>>> set\n", ioasid);  
>> not sure the change in the trace is worth. Also you may be more
>> explicit like IOASID %d to be freed cannot be found in the system
>> ioasid set.
> Yes, better. will do.
> 
>> shouldn't it be rate_limited as it is originated from
>> user space?
> virtual command is only used in the guest kernel, not from userspace
> though. But I should add ratelimited to all user originated calls.
Sure I mixed things up. Sorry for the noise

Eric
> 
>>> return;
>>> }
>>> vcmd_free_pasid(iommu, ioasid);
>>> @@ -3288,6 +3292,19 @@ static int __init init_dmars(void)
>>> if (ret)
>>> goto free_iommu;
>>>  
>>> +   /* PASID is needed for scalable mode irrespective to SVM */
>>> +   if (intel_iommu_sm) {
>>> +   ioasid_install_capacity(intel_pasid_max_id);
>>> +   /* We should not run out of IOASIDs at boot */
>>> +   host_pasid_set = ioasid_alloc_set(NULL,
>>> PID_MAX_DEFAULT,  
>> s/PID_MAX_DEFAULT/intel_pasid_max_id?
> Not really, when both baremetal and guest SVA are used on the same
> system, we want to to limit the baremetal SVM PASID to the number of
> host processes. host_pasid_set is for baremetal only.
> 
> intel_pasid_max_id would take up the entire PASID resource and leave no
> PASIDs for guest usages.
> 
>>> +
>>> IOASID_SET_TYPE_NULL);  
>> as suggested by jean-Philippe ioasid_set_alloc
>>> +   if (IS_ERR

Re: [PATCH v2 1/9] docs: Document IO Address Space ID (IOASID) APIs

2020-09-07 Thread Auger Eric
Hi Jacob,

On 9/1/20 6:56 PM, Jacob Pan wrote:
> Hi Eric,
> 
> On Thu, 27 Aug 2020 18:21:07 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>> On 8/24/20 12:32 PM, Jean-Philippe Brucker wrote:
>>> On Fri, Aug 21, 2020 at 09:35:10PM -0700, Jacob Pan wrote:  
>>>> IOASID is used to identify address spaces that can be targeted by
>>>> device DMA. It is a system-wide resource that is essential to its
>>>> many users. This document is an attempt to help developers from
>>>> all vendors navigate the APIs. At this time, ARM SMMU and Intel’s
>>>> Scalable IO Virtualization (SIOV) enabled platforms are the
>>>> primary users of IOASID. Examples of how SIOV components interact
>>>> with IOASID APIs are provided in that many APIs are driven by the
>>>> requirements from SIOV.
>>>>
>>>> Signed-off-by: Liu Yi L 
>>>> Signed-off-by: Wu Hao 
>>>> Signed-off-by: Jacob Pan 
>>>> ---
>>>>  Documentation/ioasid.rst | 618
>>>> +++ 1 file changed,
>>>> 618 insertions(+) create mode 100644 Documentation/ioasid.rst
>>>>
>>>> diff --git a/Documentation/ioasid.rst b/Documentation/ioasid.rst  
>>>
>>> Thanks for writing this up. Should it go to
>>> Documentation/driver-api/, or Documentation/driver-api/iommu/? I
>>> think this also needs to Cc linux-...@vger.kernel.org and
>>> cor...@lwn.net 
>>>> new file mode 100644
>>>> index ..b6a8cdc885ff
>>>> --- /dev/null
>>>> +++ b/Documentation/ioasid.rst
>>>> @@ -0,0 +1,618 @@
>>>> +.. ioasid:
>>>> +
>>>> +=
>>>> +IO Address Space ID
>>>> +=
>>>> +
>>>> +IOASID is a generic name for PCIe Process Address ID (PASID) or
>>>> ARM +SMMU sub-stream ID. An IOASID identifies an address space
>>>> that DMA  
>>>
>>> "SubstreamID"  
>> On ARM if we don't use PASIDs we have streamids (SID) which can also
>> identify address spaces that DMA requests can target. So maybe this
>> definition is not sufficient.
>>
> According to SMMU spec, the SubstreamID is equivalent to PASID. My
> understanding is that SID is equivalent to PCI requester ID that
> identifies stage 2. Do you plan to use IOASID for stage 2?
No. So actually if PASID is not used we still have a default single
IOASID matching the single context. So that may be fine as a definition.
> IOASID is mostly for SVA and DMA request w/ PASID.
> 
>>>   
>>>> +requests can target.
>>>> +
>>>> +The primary use cases for IOASID are Shared Virtual Address (SVA)
>>>> and +IO Virtual Address (IOVA). However, the requirements for
>>>> IOASID  
>>>
>>> IOVA alone isn't a use case, maybe "multiple IOVA spaces per
>>> device"? 
>>>> +management can vary among hardware architectures.
>>>> +
>>>> +This document covers the generic features supported by IOASID
>>>> +APIs. Vendor-specific use cases are also illustrated with Intel's
>>>> VT-d +based platforms as the first example.
>>>> +
>>>> +.. contents:: :local:
>>>> +
>>>> +Glossary
>>>> +
>>>> +PASID - Process Address Space ID
>>>> +
>>>> +IOASID - IO Address Space ID (generic term for PCIe PASID and
>>>> +sub-stream ID in SMMU)  
>>>
>>> "SubstreamID"
>>>   
>>>> +
>>>> +SVA/SVM - Shared Virtual Addressing/Memory
>>>> +
>>>> +ENQCMD - New Intel X86 ISA for efficient workqueue submission
>>>> [1]  
>>>
>>> Maybe drop the "New", to keep the documentation perennial. It might
>>> be good to add internal links here to the specifications URLs at
>>> the bottom. 
>>>> +
>>>> +DSA - Intel Data Streaming Accelerator [2]
>>>> +
>>>> +VDCM - Virtual device composition module [3]
>>>> +
>>>> +SIOV - Intel Scalable IO Virtualization
>>>> +
>>>> +
>>>> +Key Concepts
>>>> +
>>>> +
>>>> +IOASID Set
>>>> +---
>>>> +An IOASID set is a group of IOASIDs allocated from the system-wide
>>>> +IOASID pool. An IOASID set is created and can be identified by a
>>>> +t

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-04 Thread Auger Eric
Hi,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> Add a topology description to the virtio-iommu driver and enable x86
> platforms.
> 
> Since [v2] we have made some progress on adding ACPI support for
> virtio-iommu, which is the preferred boot method on x86. It will be a
> new vendor-agnostic table describing para-virtual topologies in a
> minimal format. However some platforms don't use either ACPI or DT for
> booting (for example microvm), and will need the alternative topology
> description method proposed here. In addition, since the process to get
> a new ACPI table will take a long time, this provides a boot method even
> to ACPI-based platforms, if only temporarily for testing and
> development.
> 
> v3:
> * Add patch 1 that moves virtio-iommu to a subfolder.
> * Split the rest:
>   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> support.
>   * Patch 4 adds definitions.
>   * Patch 5 adds parser in topology.c.
> * Address other comments.
> 
> Linux and QEMU patches available at:
> https://jpbrucker.net/git/linux virtio-iommu/devel
> https://jpbrucker.net/git/qemu virtio-iommu/devel
I have tested that series with above QEMU branch on ARM with virtio-net
and virtio-blk translated devices in non DT mode.

It works for me:
Tested-by: Eric Auger 

Thanks

Eric

> 
> [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> [v2] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [v1] 
> https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> [rfc] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Move to drivers/iommu/virtio/
>   iommu/virtio: Add topology helpers
>   PCI: Add DMA configuration for virtual platforms
>   iommu/virtio: Add topology definitions
>   iommu/virtio: Support topology description in config space
>   iommu/virtio: Enable x86 support
> 
>  drivers/iommu/Kconfig |  18 +-
>  drivers/iommu/Makefile|   3 +-
>  drivers/iommu/virtio/Makefile |   4 +
>  drivers/iommu/virtio/topology-helpers.h   |  50 +
>  include/linux/virt_iommu.h|  15 ++
>  include/uapi/linux/virtio_iommu.h |  44 
>  drivers/iommu/virtio/topology-helpers.c   | 196 
>  drivers/iommu/virtio/topology.c   | 259 ++
>  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
>  drivers/pci/pci-driver.c  |   5 +
>  MAINTAINERS   |   3 +-
>  11 files changed, 597 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/iommu/virtio/Makefile
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
>  create mode 100644 drivers/iommu/virtio/topology.c
>  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/6] iommu/virtio: Add topology helpers

2020-09-04 Thread Auger Eric
Hi Jean,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> To support topology description from ACPI and from the builtin
> description, add helpers to keep track of I/O topology descriptors.
> 
> To ease re-use of the helpers by other drivers and future ACPI
> extensions, use the "virt_" prefix rather than "virtio_" when naming
> structs and functions.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig   |   3 +
>  drivers/iommu/virtio/Makefile   |   1 +
>  drivers/iommu/virtio/topology-helpers.h |  50 ++
>  include/linux/virt_iommu.h  |  15 ++
>  drivers/iommu/virtio/topology-helpers.c | 196 
>  drivers/iommu/virtio/virtio-iommu.c |   4 +
>  MAINTAINERS |   1 +
>  7 files changed, 270 insertions(+)
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index bef5d75e306b..e29ae50f7100 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -391,4 +391,7 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY_HELPERS
> + bool
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/virtio/Makefile b/drivers/iommu/virtio/Makefile
> index 279368fcc074..b42ad47eac7e 100644
> --- a/drivers/iommu/virtio/Makefile
> +++ b/drivers/iommu/virtio/Makefile
> @@ -1,2 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS) += topology-helpers.o
> diff --git a/drivers/iommu/virtio/topology-helpers.h 
> b/drivers/iommu/virtio/topology-helpers.h
> new file mode 100644
> index ..436ca6a900c5
> --- /dev/null
> +++ b/drivers/iommu/virtio/topology-helpers.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef TOPOLOGY_HELPERS_H_
> +#define TOPOLOGY_HELPERS_H_
> +
> +#ifdef CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS
> +
> +/* Identify a device node in the topology */
> +struct virt_topo_dev_id {
> + unsigned inttype;
> +#define VIRT_TOPO_DEV_TYPE_PCI   1
> +#define VIRT_TOPO_DEV_TYPE_MMIO  2
> + union {
> + /* PCI endpoint or range */
> + struct {
> + u16 segment;
> + u16 bdf_start;
> + u16 bdf_end;
> + };
> + /* MMIO region */
> + u64 base;
> + };
> +};
> +
> +/* Specification of an IOMMU */
> +struct virt_topo_iommu {
> + struct virt_topo_dev_id dev_id;
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headlist;
> +};
> +
> +/* Specification of an endpoint */
> +struct virt_topo_endpoint {
> + struct virt_topo_dev_id dev_id;
> + u32 endpoint_id;
> + struct virt_topo_iommu  *viommu;
> + struct list_headlist;
> +};
> +
> +void virt_topo_add_endpoint(struct virt_topo_endpoint *ep);
> +void virt_topo_add_iommu(struct virt_topo_iommu *viommu);
> +
> +void virt_topo_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
> +
> +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +static inline void virt_topo_set_iommu_ops(struct device *dev, struct 
> iommu_ops *ops)
> +{ }
> +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +#endif /* TOPOLOGY_HELPERS_H_ */
> diff --git a/include/linux/virt_iommu.h b/include/linux/virt_iommu.h
> new file mode 100644
> index ..17d2bd4732e0
> --- /dev/null
> +++ b/include/linux/virt_iommu.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef VIRT_IOMMU_H_
> +#define VIRT_IOMMU_H_
> +
> +#ifdef CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS
> +int virt_dma_configure(struct device *dev);
> +
> +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +static inline int virt_dma_configure(struct device *dev)
> +{
> + /* Don't disturb the normal DMA configuration methods */
> + return 0;
> +}
> +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +#endif /* VIRT_IOMMU_H_ */
> diff --git a/drivers/iommu/virtio/topology-helpers.c 
> b/drivers/iommu/virtio/topology-helpers.c
> new file mode 100644
> index ..8815e3a5d431
> --- /dev/null
> +++ b/drivers/iommu/virtio/topology-helpers.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "topology-helpers.h"
> +
> +static LIST_HEAD(viommus);
> +static LIST_HEAD(pci_endpoints);
> +static LIST_HEAD(mmio_endpoints);
> 

Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-04 Thread Auger Eric
> +
> + /* Find out if the device supports topology description */
> + iowrite32(0, _cfg->device_feature_select);
> + features = ioread32(_cfg->device_feature);
> +
> + if (!(features & BIT(VIRTIO_IOMMU_F_TOPOLOGY))) {
> + pci_dbg(dev, "device doesn't have topology description");
> + goto out_reset;
> + }
> +
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, );
> +     if (!ret) {
> + pci_warn(dev, "device config capability not found\n");
> + goto out_reset;
> + }
> +
> + regs = pci_iomap(dev, cap.bar, 0);
> + if (!regs)
> + goto out_reset;
> +
> + pci_info(dev, "parsing virtio-iommu topology\n");
> + ret = viommu_parse_topology(>dev, regs + cap.offset,
> + pci_resource_len(dev, 0) - cap.offset);
> + if (ret)
> + pci_warn(dev, "failed to parse topology: %d\n", ret);
> +
> + pci_iounmap(dev, regs);
> +out_reset:
> + ret = viommu_pci_reset(common_cfg);
> + if (ret)
> + pci_warn(dev, "unable to reset device\n");
> +out_unmap_common:
> + pci_iounmap(dev, common_regs);
> +}
> +
> +/*
> + * Catch a PCI virtio-iommu implementation early to get the topology 
> description
> + * before we start probing other endpoints.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1040 + 
> VIRTIO_ID_IOMMU,
> + viommu_pci_parse_topology);
> 
Reviewed-by: Eric Auger 

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu/virtio: Add topology definitions

2020-09-04 Thread Auger Eric
Hi Jean,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> Add struct definitions for describing endpoints managed by the
> virtio-iommu. When VIRTIO_IOMMU_F_TOPOLOGY is offered, an array of
> virtio_iommu_topo_* structures in config space describes the endpoints,
> identified either by their PCI BDF or their physical MMIO address.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/uapi/linux/virtio_iommu.h | 44 +++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..70cba30644d5 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -16,6 +16,7 @@
>  #define VIRTIO_IOMMU_F_BYPASS3
>  #define VIRTIO_IOMMU_F_PROBE 4
>  #define VIRTIO_IOMMU_F_MMIO  5
> +#define VIRTIO_IOMMU_F_TOPOLOGY  6
>  
>  struct virtio_iommu_range_64 {
>   __le64  start;
> @@ -27,6 +28,17 @@ struct virtio_iommu_range_32 {
>   __le32  end;
>  };
>  
> +struct virtio_iommu_topo_config {
> + /* Number of topology description structures */
> + __le16  count;
> + /*
> +  * Offset to the first topology description structure
> +  * (virtio_iommu_topo_*) from the start of the virtio_iommu config
> +  * space. Aligned on 8 bytes.
> +  */
> + __le16  offset;
> +};
> +
>  struct virtio_iommu_config {
>   /* Supported page sizes */
>   __le64  page_size_mask;
> @@ -36,6 +48,38 @@ struct virtio_iommu_config {
>   struct virtio_iommu_range_32domain_range;
>   /* Probe buffer size */
>   __le32  probe_size;
> + struct virtio_iommu_topo_config topo_config;
> +};
> +
> +#define VIRTIO_IOMMU_TOPO_PCI_RANGE  0x1
> +#define VIRTIO_IOMMU_TOPO_MMIO   0x2
> +
> +struct virtio_iommu_topo_pci_range {
> + /* VIRTIO_IOMMU_TOPO_PCI_RANGE */
> + __u8type;
> + __u8reserved;
> + /* Length of this structure */
> + __le16  length;
> + /* First endpoint ID in the range */
> + __le32  endpoint_start;
> + /* PCI domain number */
> + __le16  segment;
> + /* PCI Bus:Device.Function range */
> + __le16  bdf_start;
> + __le16  bdf_end;
> + __le16  padding;
> +};
> +
> +struct virtio_iommu_topo_mmio {
> + /* VIRTIO_IOMMU_TOPO_MMIO */
> + __u8type;
> + __u8reserved;
> + /* Length of this structure */
> + __le16  length;
> + /* Endpoint ID */
> + __le32  endpoint;
> + /* Address of the first MMIO region */
> + __le64  address;
>  };
>  
>  /* Request types */
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   6   7   >