Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-26 Thread Jean-Philippe Brucker
On 25/09/2018 23:17, Jacob Pan wrote:
> On Tue, 25 Sep 2018 15:58:41 +0100
> Jean-Philippe Brucker  wrote:
> 
>> Hi Jacob,
>> 
>> Just two minor things below, that I noticed while using fault handlers
>> for SVA. From my perspective the series is fine otherwise
>> 
>> On 11/05/2018 21:54, Jacob Pan wrote:
>> > +int iommu_unregister_device_fault_handler(struct device *dev)
>> > +{
>> > +   struct iommu_param *param = dev->iommu_param;
>> > +   int ret = 0;
>> > +
>> > +   if (!param)
>> > +   return -EINVAL;
>> > +
>> > +   mutex_lock(¶m->lock);  
>> 
>> Could we check that param->fault_param isn't NULL here, so that the
>> driver can call this function unconditionally in a cleanup path?
>> 
> sounds good.
> 
>     if (!param || param->fault_param)
>     return -EINVAL;

That would be too convenient... param needs to be checked before taking
the lock, and fault_param accessed after

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-25 Thread Jacob Pan
On Tue, 25 Sep 2018 15:58:41 +0100
Jean-Philippe Brucker  wrote:

> Hi Jacob,
> 
> Just two minor things below, that I noticed while using fault handlers
> for SVA. From my perspective the series is fine otherwise
> 
> On 11/05/2018 21:54, Jacob Pan wrote:
> > +int iommu_unregister_device_fault_handler(struct device *dev)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   if (!param)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(¶m->lock);  
> 
> Could we check that param->fault_param isn't NULL here, so that the
> driver can call this function unconditionally in a cleanup path?
> 
sounds good.

if (!param || param->fault_param)
return -EINVAL;

> > +   /* we cannot unregister handler if there are pending faults
> > */
> > +   if (!list_empty(¶m->fault_param->faults)) {
> > +   ret = -EBUSY;
> > +   goto unlock;
> > +   }
> > +
> > +   kfree(param->fault_param);
> > +   param->fault_param = NULL;
> > +   put_device(dev);
> > +unlock:
> > +   mutex_unlock(¶m->lock);
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> > +
> > +
> > +/**
> > + * iommu_report_device_fault() - Report fault event to device
> > + * @dev: the device
> > + * @evt: fault event data
> > + *
> > + * Called by IOMMU model specific drivers when fault is detected,
> > typically
> > + * in a threaded IRQ handler.
> > + *
> > + * Return 0 on success, or an error.
> > + */
> > +int iommu_report_device_fault(struct device *dev, struct
> > iommu_fault_event *evt)
> > +{
> > +   int ret = 0;
> > +   struct iommu_fault_event *evt_pending;
> > +   struct iommu_fault_param *fparam;
> > +
> > +   /* iommu_param is allocated when device is added to group */
> > +   if (!dev->iommu_param | !evt)  
> 
> Should probably be ||
> 
your are right, thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-25 Thread Jean-Philippe Brucker
Hi Jacob,

Just two minor things below, that I noticed while using fault handlers
for SVA. From my perspective the series is fine otherwise

On 11/05/2018 21:54, Jacob Pan wrote:
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> +   struct iommu_param *param = dev->iommu_param;
> +   int ret = 0;
> +
> +   if (!param)
> +   return -EINVAL;
> +
> +   mutex_lock(¶m->lock);

Could we check that param->fault_param isn't NULL here, so that the
driver can call this function unconditionally in a cleanup path?

> +   /* we cannot unregister handler if there are pending faults */
> +   if (!list_empty(¶m->fault_param->faults)) {
> +   ret = -EBUSY;
> +   goto unlock;
> +   }
> +
> +   kfree(param->fault_param);
> +   param->fault_param = NULL;
> +   put_device(dev);
> +unlock:
> +   mutex_unlock(¶m->lock);
> +
> +   return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> +
> +
> +/**
> + * iommu_report_device_fault() - Report fault event to device
> + * @dev: the device
> + * @evt: fault event data
> + *
> + * Called by IOMMU model specific drivers when fault is detected, typically
> + * in a threaded IRQ handler.
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt)
> +{
> +   int ret = 0;
> +   struct iommu_fault_event *evt_pending;
> +   struct iommu_fault_param *fparam;
> +
> +   /* iommu_param is allocated when device is added to group */
> +   if (!dev->iommu_param | !evt)

Should probably be ||

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-17 Thread Jacob Pan
On Fri, 14 Sep 2018 15:24:41 +0200
Auger Eric  wrote:

> Hi Jacob,
> 
> On 5/11/18 10:54 PM, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> > 
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> > 
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> > 
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> > 
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 149
> > +-
> > include/linux/iommu.h |  35 +++- 2 files changed, 181
> > insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 3a49b96..b3f9daf 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) goto err_free_name;
> > }
> >  
> > +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_param) {
> > +   ret = -ENOMEM;
> > +   goto err_free_name;
> > +   }
> > +   mutex_init(&dev->iommu_param->lock);
> > +
> > kobject_get(group->devices_kobj);
> >  
> > dev->iommu_group = group;
> > @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) mutex_unlock(&group->mutex);
> > dev->iommu_group = NULL;
> > kobject_put(group->devices_kobj);
> > +   kfree(dev->iommu_param);
> >  err_free_name:
> > kfree(device->name);
> >  err_remove_link:
> > @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
> > *dev) sysfs_remove_link(&dev->kobj, "iommu_group");
> >  
> > trace_remove_device_from_group(group->id, dev);
> > -
> > +   kfree(dev->iommu_param);
> > kfree(device->name);
> > kfree(device);
> > dev->iommu_group = NULL;
> > @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group,
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> >  /**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler,
> > +   void *data)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   /*
> > +* Device iommu_param should have been allocated when
> > device is
> > +* added to its iommu_group.
> > +*/
> > +   if (!param)
> > +   return -EINVAL;
> > +
> > +   mute

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-14 Thread Auger Eric
Hi Jacob,

On 5/11/18 10:54 PM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
> 
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
> 
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(&dev->iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(&group->mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(&dev->kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(¶m->lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-07 Thread Jean-Philippe Brucker
On 07/09/2018 08:11, Auger Eric wrote:
>>> On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
 On 06/09/2018 10:25, Auger Eric wrote:
>> +mutex_lock(&fparam->lock);
>> +list_add_tail(&evt_pending->list, &fparam->faults);
> same doubt as Yi Liu. You cannot rely on the userspace willingness to
> void the queue and deallocate this memory.
>>>
>>> By the way I saw there is a kind of garbage collectors for faults which
>>> wouldn't have received any responses. However I am not sure this removes
>>> the concern of having the fault list on kernel side growing beyond
>>> acceptable limits.
>>
>> How about per-device quotas? (https://lkml.org/lkml/2018/4/23/706 for
>> reference) With PRI the IOMMU driver already sets per-device credits
>> when initializing the device (pci_enable_pri), so if the device behaves
>> properly it shouldn't send new page requests once the number of
>> outstanding ones is maxed out.
> 
> But this needs to work for non PRI use case too?

Only recoverable faults, PRI and stall, are added to the fparam->faults
list, because the kernel needs to make sure that each of these faults
gets a reply, or else they are held in hardware indefinitely.
Non-recoverable faults don't need tracking, the IOMMU API can forget
about them after they're reported. Rate-limiting could be done by the
consumer if it gets flooded by non-recoverable faults, for example by
dropping some of them.

>> Right, an event contains more information than a PRI page request.
>> Stage-2 fields (CLASS, S2, IPA, TTRnW) cannot be represented by
>> iommu_fault_event at the moment.
> 
> Yes I am currently doing the mapping exercise between SMMUv3 events and
> iommu_fault_event and I miss config errors for instance.

We may have initially focused only on guest and userspace config errors
(IOMMU_FAULT_REASON_PASID_FETCH, IOMMU_FAULT_REASON_PASID_INVALID, etc),
since other config errors are most likely a bug in the host IOMMU
driver, and could be reported with pr_err

>  For precise emulation it might be
>> useful to at least add the S2 flag (as a new iommu_fault_reason?), so
>> that when the guest maps stage-1 to an invalid GPA, QEMU could for
>> example inject an external abort.
> 
> Actually we may even need to filter events and return to the guest only
> the S1 related.
>>
>>> queue
>>> size, that may deserve to create different APIs and internal data
>>> structs. Also this may help separating the concerns.
>>
>> It might duplicate them. If the consumer of the event report is a host
>> device driver, the SMMU needs to report a "generic" iommu_fault_event,
>> and if the consumer is VFIO it would report a specialized one
> 
> I am unsure of my understanding of the UNRECOVERABLE error type. Is it
> everything else than a PRI. For instance are all SMMUv3 event errors
> supposed to be put under the IOMMU_FAULT_DMA_UNRECOV umbrella?

I guess it's more clear-cut in VT-d, which defines recoverable and
non-recoverable faults. In SMMUv3, PRI Page Requests are recoverable,
but event errors can also be recoverable if they have the Stall flag set.

Stall is a way for non-PCI endpoints to do SVA, and I have a patch in my
series that sorts events into PAGE_REQ and DMA_UNRECOV before feeding
them to this API: https://patchwork.kernel.org/patch/10395043/

> If I understand correctly there are different consumers for PRI and
> unrecoverable data, so why not having 2 different APIs.

My reasoning was that for virtualization they go through the same
channel, VFIO, until the guest or the vIOMMU dispatches them depending
on their type, so we might as well use the same API.

In addition, host device drivers might also want to handle stall or PRI
events themselves instead of relying on the SVA infrastructure. For
example the MSM GPU with SMMUv2: https://patchwork.kernel.org/patch/9953803/

>>> My remark also
>>> stems from the fact the SMMU uses 2 different queues, whose size can
>>> also be different.
>>
>> Hm, for PRI requests the kernel-userspace queue size should actually be
>> the number of PRI credits for that device. Hadn't thought about it
>> before, where do we pass that info to userspace?
> Cannot help here at the moment, sorry.
>  For fault events, the
>> queue could be as big as the SMMU event queue, though using all that
>> space might be wasteful.
> The guest has its own programming of the SMMU_EVENTQ_BASE.LOG2SIZE. This
> could be used to program the SW fifo
> 
>  Non-stalled events should be rare and reporting
>> them isn't urgent. Stalled ones would need the number of stall credits I
>> mentioned above, which realistically will be a lot less than the SMMU
>> event queue size. Given that a device will use either PRI or stall but
>> not both, I still think events and PRI could go through the same queue.
> Did I get it right PRI is for PCIe and STALL for non PCIe? But all that
> stuff also is related to Page Request use case, right?

Yes, a stall event is a page request from a non-PCI device,

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-07 Thread Auger Eric
Hi Jean-Philippe,

On 09/06/2018 07:06 PM, Jean-Philippe Brucker wrote:
> On 06/09/2018 14:14, Auger Eric wrote:
>> Hi Jean-Philippe,
>>
>> On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
>>> On 06/09/2018 10:25, Auger Eric wrote:
> + mutex_lock(&fparam->lock);
> + list_add_tail(&evt_pending->list, &fparam->faults);
 same doubt as Yi Liu. You cannot rely on the userspace willingness to
 void the queue and deallocate this memory.
>>
>> By the way I saw there is a kind of garbage collectors for faults which
>> wouldn't have received any responses. However I am not sure this removes
>> the concern of having the fault list on kernel side growing beyond
>> acceptable limits.
> 
> How about per-device quotas? (https://lkml.org/lkml/2018/4/23/706 for
> reference) With PRI the IOMMU driver already sets per-device credits
> when initializing the device (pci_enable_pri), so if the device behaves
> properly it shouldn't send new page requests once the number of
> outstanding ones is maxed out.

But this needs to work for non PRI use case too?
> 
> The stall mode of SMMU doesn't have per-device limit, and depending on
> the implementation it might be easy for one guest using stall to prevent
> other guests from receiving faults. For this reason we'll have to
> enforce a per-device stall quota in the SMMU driver, and immediately
> terminate faults that exceed this quota. We could easily do the same for
> PRI, if we don't trust devices to follow the spec. The difficult part is
> finding the right number of credits...
> 
>>> Host device drivers that use this API to be notified on fault can't deal
>>> with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
>>> the APIs should be arch-agnostic. Given that requirement, using a single
>>> iommu_fault_event structure for both PRI and event queues made sense,
>>> especially since the even queue can have stall events that look a lot
>>> like PRI page requests.
>> I understand the data structure needs to be generic. Now I guess PRI
>> events and other standard translator error events (that can happen
>> without PRI) may have different characteristics in event fields,
> 
> Right, an event contains more information than a PRI page request.
> Stage-2 fields (CLASS, S2, IPA, TTRnW) cannot be represented by
> iommu_fault_event at the moment.

Yes I am currently doing the mapping exercise between SMMUv3 events and
iommu_fault_event and I miss config errors for instance.
 For precise emulation it might be
> useful to at least add the S2 flag (as a new iommu_fault_reason?), so
> that when the guest maps stage-1 to an invalid GPA, QEMU could for
> example inject an external abort.

Actually we may even need to filter events and return to the guest only
the S1 related.
> 
>> queue
>> size, that may deserve to create different APIs and internal data
>> structs. Also this may help separating the concerns.
> 
> It might duplicate them. If the consumer of the event report is a host
> device driver, the SMMU needs to report a "generic" iommu_fault_event,
> and if the consumer is VFIO it would report a specialized one

I am unsure of my understanding of the UNRECOVERABLE error type. Is it
everything else than a PRI. For instance are all SMMUv3 event errors
supposed to be put under the IOMMU_FAULT_DMA_UNRECOV umbrella?

If I understand correctly there are different consumers for PRI and
unrecoverable data, so why not having 2 different APIs.
> 
>> My remark also
>> stems from the fact the SMMU uses 2 different queues, whose size can
>> also be different.
> 
> Hm, for PRI requests the kernel-userspace queue size should actually be
> the number of PRI credits for that device. Hadn't thought about it
> before, where do we pass that info to userspace?
Cannot help here at the moment, sorry.
 For fault events, the
> queue could be as big as the SMMU event queue, though using all that
> space might be wasteful.
The guest has its own programming of the SMMU_EVENTQ_BASE.LOG2SIZE. This
could be used to program the SW fifo

 Non-stalled events should be rare and reporting
> them isn't urgent. Stalled ones would need the number of stall credits I
> mentioned above, which realistically will be a lot less than the SMMU
> event queue size. Given that a device will use either PRI or stall but
> not both, I still think events and PRI could go through the same queue.
Did I get it right PRI is for PCIe and STALL for non PCIe? But all that
stuff also is related to Page Request use case, right?

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Jean-Philippe Brucker
On 06/09/2018 14:14, Auger Eric wrote:
> Hi Jean-Philippe,
> 
> On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
>> On 06/09/2018 10:25, Auger Eric wrote:
 +  mutex_lock(&fparam->lock);
 +  list_add_tail(&evt_pending->list, &fparam->faults);
>>> same doubt as Yi Liu. You cannot rely on the userspace willingness to
>>> void the queue and deallocate this memory.
> 
> By the way I saw there is a kind of garbage collectors for faults which
> wouldn't have received any responses. However I am not sure this removes
> the concern of having the fault list on kernel side growing beyond
> acceptable limits.

How about per-device quotas? (https://lkml.org/lkml/2018/4/23/706 for
reference) With PRI the IOMMU driver already sets per-device credits
when initializing the device (pci_enable_pri), so if the device behaves
properly it shouldn't send new page requests once the number of
outstanding ones is maxed out.

The stall mode of SMMU doesn't have per-device limit, and depending on
the implementation it might be easy for one guest using stall to prevent
other guests from receiving faults. For this reason we'll have to
enforce a per-device stall quota in the SMMU driver, and immediately
terminate faults that exceed this quota. We could easily do the same for
PRI, if we don't trust devices to follow the spec. The difficult part is
finding the right number of credits...

>> Host device drivers that use this API to be notified on fault can't deal
>> with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
>> the APIs should be arch-agnostic. Given that requirement, using a single
>> iommu_fault_event structure for both PRI and event queues made sense,
>> especially since the even queue can have stall events that look a lot
>> like PRI page requests.
> I understand the data structure needs to be generic. Now I guess PRI
> events and other standard translator error events (that can happen
> without PRI) may have different characteristics in event fields,

Right, an event contains more information than a PRI page request.
Stage-2 fields (CLASS, S2, IPA, TTRnW) cannot be represented by
iommu_fault_event at the moment. For precise emulation it might be
useful to at least add the S2 flag (as a new iommu_fault_reason?), so
that when the guest maps stage-1 to an invalid GPA, QEMU could for
example inject an external abort.

> queue
> size, that may deserve to create different APIs and internal data
> structs. Also this may help separating the concerns.

It might duplicate them. If the consumer of the event report is a host
device driver, the SMMU needs to report a "generic" iommu_fault_event,
and if the consumer is VFIO it would report a specialized one

> My remark also
> stems from the fact the SMMU uses 2 different queues, whose size can
> also be different.

Hm, for PRI requests the kernel-userspace queue size should actually be
the number of PRI credits for that device. Hadn't thought about it
before, where do we pass that info to userspace? For fault events, the
queue could be as big as the SMMU event queue, though using all that
space might be wasteful. Non-stalled events should be rare and reporting
them isn't urgent. Stalled ones would need the number of stall credits I
mentioned above, which realistically will be a lot less than the SMMU
event queue size. Given that a device will use either PRI or stall but
not both, I still think events and PRI could go through the same queue.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Auger Eric
Hi Jean-Philippe,

On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
> On 06/09/2018 10:25, Auger Eric wrote:
>>> +   mutex_lock(&fparam->lock);
>>> +   list_add_tail(&evt_pending->list, &fparam->faults);
>> same doubt as Yi Liu. You cannot rely on the userspace willingness to
>> void the queue and deallocate this memory.

By the way I saw there is a kind of garbage collectors for faults which
wouldn't have received any responses. However I am not sure this removes
the concern of having the fault list on kernel side growing beyond
acceptable limits.
>>
>> SMMUv3 holds a queue of events whose size is implementation dependent.
>> I think such a queue should be available at SW level and its size should
>> be negotiated.
> 
> Note that this fault API can also be used by host device drivers that
> want to be notified on fault, in which case a direct callback seems
> easier to use, and perhaps more efficient than an intermediate queue.

> 
> When injecting faults into userspace it makes sense to batch events, to
> avoid context switches. Even though that queue management should
> probably be done by VFIO, the IOMMU API has to help in some way, at
> least to tell VFIO when the IOMMU driver is done processing a batch of
> event.
Yes I am currently investigating the usage of a kfifo in
vfio_iommu_type1, filled by the direct callback.
> 
>> Note SMMU has separate queues for PRI and fault events. Here you use the
>> same queue for all events. I don't know if it would make sense to have
>> separate APIs?
> 
> Host device drivers that use this API to be notified on fault can't deal
> with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
> the APIs should be arch-agnostic. Given that requirement, using a single
> iommu_fault_event structure for both PRI and event queues made sense,
> especially since the even queue can have stall events that look a lot
> like PRI page requests.
I understand the data structure needs to be generic. Now I guess PRI
events and other standard translator error events (that can happen
without PRI) may have different characteristics in event fields, queue
size, that may deserve to create different APIs and internal data
structs. Also this may help separating the concerns. My remark also
stems from the fact the SMMU uses 2 different queues, whose size can
also be different.

Thanks

Eric
> 
> Or do you mean separate APIs for recoverable and non-recoverable faults?
> Using the same queue for PRI and stall event, and a separate one for
> non-recoverable events?
> 
> Separate queues may be useful for the device driver scenario
> (non-virtualization), where recoverable faults are handled by
> io-pgfaults while non-recoverable ones could be reported directly to the
> device driver. For this case I was thinking of adding a
> multiple-consumer thing: both io-pgfaults and the device driver register
> a fault handler. io-pgfault handles recoverable ones and what's left
> goes to the device driver. This could also allow the device driver to be
> notified when io-pgfault doesn't successfully handle a fault
> (handle_mm_fault returns an error).
> 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Jean-Philippe Brucker
On 06/09/2018 10:25, Auger Eric wrote:
>> +mutex_lock(&fparam->lock);
>> +list_add_tail(&evt_pending->list, &fparam->faults);
> same doubt as Yi Liu. You cannot rely on the userspace willingness to
> void the queue and deallocate this memory.
> 
> SMMUv3 holds a queue of events whose size is implementation dependent.
> I think such a queue should be available at SW level and its size should
> be negotiated.

Note that this fault API can also be used by host device drivers that
want to be notified on fault, in which case a direct callback seems
easier to use, and perhaps more efficient than an intermediate queue.

When injecting faults into userspace it makes sense to batch events, to
avoid context switches. Even though that queue management should
probably be done by VFIO, the IOMMU API has to help in some way, at
least to tell VFIO when the IOMMU driver is done processing a batch of
event.

> Note SMMU has separate queues for PRI and fault events. Here you use the
> same queue for all events. I don't know if it would make sense to have
> separate APIs?

Host device drivers that use this API to be notified on fault can't deal
with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
the APIs should be arch-agnostic. Given that requirement, using a single
iommu_fault_event structure for both PRI and event queues made sense,
especially since the even queue can have stall events that look a lot
like PRI page requests.

Or do you mean separate APIs for recoverable and non-recoverable faults?
Using the same queue for PRI and stall event, and a separate one for
non-recoverable events?

Separate queues may be useful for the device driver scenario
(non-virtualization), where recoverable faults are handled by
io-pgfaults while non-recoverable ones could be reported directly to the
device driver. For this case I was thinking of adding a
multiple-consumer thing: both io-pgfaults and the device driver register
a fault handler. io-pgfault handles recoverable ones and what's left
goes to the device driver. This could also allow the device driver to be
notified when io-pgfault doesn't successfully handle a fault
(handle_mm_fault returns an error).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Auger Eric
Hi Jacob,

On 05/11/2018 10:54 PM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
> 
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
> 
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(&dev->iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(&group->mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(&dev->kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
iommu_page_response name looks too specific to PRI use case. why not
using iommu_fault_response.
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
Same here s/IOMMU_PAGE_RESP/IOMMU_PAGE_RESP

That way I can easily reuse the API for SMMU nested stage handing.
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(¶m->lock);

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-21 Thread Jacob Pan
On Thu, 17 May 2018 23:22:43 +
"Liu, Yi L"  wrote:

> > So as long as in-kernel PRQ handling can do queuing, there is no
> > need for queuing in the host reporting path.  
> 
> Will it affect current interface? Here the handler only get an "evt"
> per a PRQ IRQ. And I suppose vfio needs not rely on host iommu
> queuing?

I don't think it needs iommu driver queuing.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Thursday, May 17, 2018 11:59 PM
> On Thu, 17 May 2018 11:41:56 +
> "Liu, Yi L"  wrote:
> 
> > > +int iommu_report_device_fault(struct device *dev, struct
> > > +iommu_fault_event *evt) {
> > > + int ret = 0;
> > > + struct iommu_fault_event *evt_pending;
> > > + struct iommu_fault_param *fparam;
> > > +
> > > + /* iommu_param is allocated when device is added to group
> > > */
> > > + if (!dev->iommu_param | !evt)
> > > + return -EINVAL;
> > > + /* we only report device fault if there is a handler
> > > registered */
> > > + mutex_lock(&dev->iommu_param->lock);
> > > + if (!dev->iommu_param->fault_param ||
> > > + !dev->iommu_param->fault_param->handler) {
> > > + ret = -EINVAL;
> > > + goto done_unlock;
> > > + }
> > > + fparam = dev->iommu_param->fault_param;
> > > + if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > > + evt_pending = kmemdup(evt, sizeof(struct
> > > iommu_fault_event),
> > > + GFP_KERNEL);
> > > + if (!evt_pending) {
> > > + ret = -ENOMEM;
> > > + goto done_unlock;
> > > + }
> > > + mutex_lock(&fparam->lock);
> > > + list_add_tail(&evt_pending->list,
> > > &fparam->faults);
> >
> > I may missed it. Here only see list add, how about removing? Who would
> > remove entry from the fault list?
> >
> deletion of the pending event is in page response function (int
> iommu_page_response), once iommu driver finds a matching response for the
> pending request, it will delete the pending event.
> 
> if the response never came, right now we don't delete it, just gives warning.

Got it.

> 
> > > + mutex_unlock(&fparam->lock);
> > > + }
> > > + ret = fparam->handler(evt, fparam->data);
> >
> > I remember you mentioned there will be a queue to store the faults.
> > Is it in the fparam->faults list? Or there is no such queue?
> There are two use cases:
> case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
>   e.g. vfio
> case B: in-kernel
> 
> The io page fault queuing is Jean's patchset, mostly for case B (in-kernel IO 
> page
> fault handling). I will convert intel-svm to Jean's io page fault mechanism 
> so that we
> can also have parallel and out of order queuing of PRQ. I still need some 
> time to
> evaluate intel specific needs such as streaming page request/response.
> 
> For case A, there is no queuing in host IOMMU driver. My understanding of the 
> flow
> is as the following:
> 1. host IOMMU receives PRQ
> 2. host IOMMU driver reports PRQ fault event to registered called, i.e.
> vfio
> 3. VFIO reports fault event to QEMU
> 4. QEMU injects PRQ to guest
> 5. Guest IOMMU driver receives PRQ in IRQ 6. Guest IOMMU driver queue PRQ by
> groups, PASID.

Correct.

> So as long as in-kernel PRQ handling can do queuing, there is no need for 
> queuing in
> the host reporting path.

Will it affect current interface? Here the handler only get an "evt" per a PRQ 
IRQ. And I suppose
vfio needs not rely on host iommu queuing?

Thanks,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Jacob Pan
On Thu, 17 May 2018 11:41:56 +
"Liu, Yi L"  wrote:

> > +int iommu_report_device_fault(struct device *dev, struct
> > +iommu_fault_event *evt) {
> > +   int ret = 0;
> > +   struct iommu_fault_event *evt_pending;
> > +   struct iommu_fault_param *fparam;
> > +
> > +   /* iommu_param is allocated when device is added to group
> > */
> > +   if (!dev->iommu_param | !evt)
> > +   return -EINVAL;
> > +   /* we only report device fault if there is a handler
> > registered */
> > +   mutex_lock(&dev->iommu_param->lock);
> > +   if (!dev->iommu_param->fault_param ||
> > +   !dev->iommu_param->fault_param->handler) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +   fparam = dev->iommu_param->fault_param;
> > +   if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > +   evt_pending = kmemdup(evt, sizeof(struct
> > iommu_fault_event),
> > +   GFP_KERNEL);
> > +   if (!evt_pending) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +   mutex_lock(&fparam->lock);
> > +   list_add_tail(&evt_pending->list,
> > &fparam->faults);  
> 
> I may missed it. Here only see list add, how about removing? Who
> would remove entry from the fault list?
> 
deletion of the pending event is in page response function (int
iommu_page_response), once iommu driver finds a matching response for
the pending request, it will delete the pending event.

if the response never came, right now we don't delete it, just gives
warning.

> > +   mutex_unlock(&fparam->lock);
> > +   }
> > +   ret = fparam->handler(evt, fparam->data);  
> 
> I remember you mentioned there will be a queue to store the faults.
> Is it in the fparam->faults list? Or there is no such queue?
There are two use cases:
case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
e.g. vfio
case B: in-kernel

The io page fault queuing is Jean's patchset, mostly for case
B (in-kernel IO page fault handling). I will convert intel-svm to Jean's
io page fault mechanism so that we can also have parallel and out of
order queuing of PRQ. I still need some time to evaluate intel specific
needs such as streaming page request/response.

For case A, there is no queuing in host IOMMU driver. My understanding
of the flow is as the following:
1. host IOMMU receives PRQ
2. host IOMMU driver reports PRQ fault event to registered called, i.e.
vfio
3. VFIO reports fault event to QEMU
4. QEMU injects PRQ to guest
5. Guest IOMMU driver receives PRQ in IRQ
6. Guest IOMMU driver queue PRQ by groups, PASID.
So as long as in-kernel PRQ handling can do queuing, there is no need
for queuing in the host reporting path.

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Saturday, May 12, 2018 4:54 AM
> 
> Traditionally, device specific faults are detected and handled within their 
> own device
> drivers. When IOMMU is enabled, faults such as DMA related transactions are
> detected by IOMMU. There is no generic reporting mechanism to report faults 
> back
> to the in-kernel device driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which can be
> reported at per device basis, regardless of the device type is a PCI device 
> or not.
> 
> The fault types include recoverable (e.g. page request) and unrecoverable 
> faults(e.g.
> access error). In most cases, faults can be handled by IOMMU drivers 
> internally. The
> primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is assigned 
> to
> guest via vIOMMU. In this case, the first level page tables are owned by the 
> guest.
> Page request must be propagated to the guest to let guest OS fault in the 
> pages then
> send page response. In this mechanism, the direct receiver of IOMMU fault
> notification is VFIO, which can relay notification events to QEMU or other 
> user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than simply 
> invoke reset
> function, there are needs to let device driver handle the fault with a 
> smaller impact.
> 
> This patchset is intended to create a generic fault report API such that it 
> can scale as
> follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions summarized
> at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   goto err_free_name;
>   }
> 
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(&dev->iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
> 
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   mutex_unlock(&group->mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(&dev->kobj, "iommu_group");
> 
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group,  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> 
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault
> +handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the
> +fault event
> + * and data as argument. The handler should return 0 on success. If the
> +fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(¶m->lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> +   

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Lu Baolu
Hi,

On 05/15/2018 04:55 AM, Jacob Pan wrote:
> On Mon, 14 May 2018 14:01:06 +0800
> Lu Baolu  wrote:
>
>> Hi,
>>
>> On 05/12/2018 04:54 AM, Jacob Pan wrote:
>>> Traditionally, device specific faults are detected and handled
>>> within their own device drivers. When IOMMU is enabled, faults such
>>> as DMA related transactions are detected by IOMMU. There is no
>>> generic reporting mechanism to report faults back to the in-kernel
>>> device driver or the guest OS in case of assigned devices.
>>>
>>> Faults detected by IOMMU is based on the transaction's source ID
>>> which can be reported at per device basis, regardless of the device
>>> type is a PCI device or not.
>>>
>>> The fault types include recoverable (e.g. page request) and
>>> unrecoverable faults(e.g. access error). In most cases, faults can
>>> be handled by IOMMU drivers internally. The primary use cases are as
>>> follows:
>>> 1. page request fault originated from an SVM capable device that is
>>> assigned to guest via vIOMMU. In this case, the first level page
>>> tables are owned by the guest. Page request must be propagated to
>>> the guest to let guest OS fault in the pages then send page
>>> response. In this mechanism, the direct receiver of IOMMU fault
>>> notification is VFIO, which can relay notification events to QEMU
>>> or other user space software.
>>>
>>> 2. faults need more subtle handling by device drivers. Other than
>>> simply invoke reset function, there are needs to let device driver
>>> handle the fault with a smaller impact.
>>>
>>> This patchset is intended to create a generic fault report API such
>>> that it can scale as follows:
>>> - all IOMMU types
>>> - PCI and non-PCI devices
>>> - recoverable and unrecoverable faults
>>> - VFIO and other other in kernel users
>>> - DMA & IRQ remapping (TBD)
>>> The original idea was brought up by David Woodhouse and discussions
>>> summarized at https://lwn.net/Articles/608914/.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Jean-Philippe Brucker 
>>> ---
>>>  drivers/iommu/iommu.c | 149
>>> +-
>>> include/linux/iommu.h |  35 +++- 2 files changed, 181
>>> insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 3a49b96..b3f9daf 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) goto err_free_name;
>>> }
>>>  
>>> +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
>>> GFP_KERNEL);
>>> +   if (!dev->iommu_param) {
>>> +   ret = -ENOMEM;
>>> +   goto err_free_name;
>>> +   }
>>> +   mutex_init(&dev->iommu_param->lock);
>>> +
>>> kobject_get(group->devices_kobj);
>>>  
>>> dev->iommu_group = group;
>>> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) mutex_unlock(&group->mutex);
>>> dev->iommu_group = NULL;
>>> kobject_put(group->devices_kobj);
>>> +   kfree(dev->iommu_param);
>>>  err_free_name:
>>> kfree(device->name);
>>>  err_remove_link:
>>> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
>>> *dev) sysfs_remove_link(&dev->kobj, "iommu_group");
>>>  
>>> trace_remove_device_from_group(group->id, dev);
>>> -
>>> +   kfree(dev->iommu_param);
>>> kfree(device->name);
>>> kfree(device);
>>> dev->iommu_group = NULL;
>>> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
>>> iommu_group *group,
>>> EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
>>>  /**
>>> + * iommu_register_device_fault_handler() - Register a device fault
>>> handler
>>> + * @dev: the device
>>> + * @handler: the fault handler
>>> + * @data: private data passed as argument to the handler
>>> + *
>>> + * When an IOMMU fault event is received, call this handler with
>>> the fault event
>>> + * and data as argument. The handler should return 0 on success.
>>> If the fault is
>>> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
>>> complete
>>> + * the fault by calling iommu_page_response() with one of the
>>> following
>>> + * response code:
>>> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
>>> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
>>> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
>>> reporting
>>> + *   page faults if possible.
>>> + *
>>> + * Return 0 if the fault handler was installed successfully, or an
>>> error.
>>> + */
>>> +int iommu_register_device_fault_handler(struct device *dev,
>>> +   iommu_dev_fault_handler_t
>>> handler,
>>> +   void *data)
>>> +{
>>> +   struct iommu_param *param = dev->iommu_param;
>>> +   int ret = 0;
>>> +
>>> +   /*
>>> +* Device iommu_param should have been allocated when
>>> device is
>>> +* added to its iommu_group.
>>> +*/
>>> +   if (!param)
>>> +   

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 14:01:06 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> >
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> >
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> >
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> >
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> >
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 149
> > +-
> > include/linux/iommu.h |  35 +++- 2 files changed, 181
> > insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 3a49b96..b3f9daf 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) goto err_free_name;
> > }
> >  
> > +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_param) {
> > +   ret = -ENOMEM;
> > +   goto err_free_name;
> > +   }
> > +   mutex_init(&dev->iommu_param->lock);
> > +
> > kobject_get(group->devices_kobj);
> >  
> > dev->iommu_group = group;
> > @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) mutex_unlock(&group->mutex);
> > dev->iommu_group = NULL;
> > kobject_put(group->devices_kobj);
> > +   kfree(dev->iommu_param);
> >  err_free_name:
> > kfree(device->name);
> >  err_remove_link:
> > @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
> > *dev) sysfs_remove_link(&dev->kobj, "iommu_group");
> >  
> > trace_remove_device_from_group(group->id, dev);
> > -
> > +   kfree(dev->iommu_param);
> > kfree(device->name);
> > kfree(device);
> > dev->iommu_group = NULL;
> > @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group,
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> >  /**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler,
> > +   void *data)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   /*
> > +* Device iommu_param should have been allocated when
> > device is
> > +* added to its iommu_group.
> > +*/
> > +   if (!param)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(¶m->

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-13 Thread Lu Baolu
Hi,

On 05/12/2018 04:54 AM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
>
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
>
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
>
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
>
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(&dev->iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(&group->mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(&dev->kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(¶m->lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + param