Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Auger Eric
Hi Jean-Philippe,

On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
> On 06/09/2018 10:25, Auger Eric wrote:
>>> +   mutex_lock(>lock);
>>> +   list_add_tail(_pending->list, >faults);
>> same doubt as Yi Liu. You cannot rely on the userspace willingness to
>> void the queue and deallocate this memory.

By the way I saw there is a kind of garbage collectors for faults which
wouldn't have received any responses. However I am not sure this removes
the concern of having the fault list on kernel side growing beyond
acceptable limits.
>>
>> SMMUv3 holds a queue of events whose size is implementation dependent.
>> I think such a queue should be available at SW level and its size should
>> be negotiated.
> 
> Note that this fault API can also be used by host device drivers that
> want to be notified on fault, in which case a direct callback seems
> easier to use, and perhaps more efficient than an intermediate queue.

> 
> When injecting faults into userspace it makes sense to batch events, to
> avoid context switches. Even though that queue management should
> probably be done by VFIO, the IOMMU API has to help in some way, at
> least to tell VFIO when the IOMMU driver is done processing a batch of
> event.
Yes I am currently investigating the usage of a kfifo in
vfio_iommu_type1, filled by the direct callback.
> 
>> Note SMMU has separate queues for PRI and fault events. Here you use the
>> same queue for all events. I don't know if it would make sense to have
>> separate APIs?
> 
> Host device drivers that use this API to be notified on fault can't deal
> with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
> the APIs should be arch-agnostic. Given that requirement, using a single
> iommu_fault_event structure for both PRI and event queues made sense,
> especially since the even queue can have stall events that look a lot
> like PRI page requests.
I understand the data structure needs to be generic. Now I guess PRI
events and other standard translator error events (that can happen
without PRI) may have different characteristics in event fields, queue
size, that may deserve to create different APIs and internal data
structs. Also this may help separating the concerns. My remark also
stems from the fact the SMMU uses 2 different queues, whose size can
also be different.

Thanks

Eric
> 
> Or do you mean separate APIs for recoverable and non-recoverable faults?
> Using the same queue for PRI and stall event, and a separate one for
> non-recoverable events?
> 
> Separate queues may be useful for the device driver scenario
> (non-virtualization), where recoverable faults are handled by
> io-pgfaults while non-recoverable ones could be reported directly to the
> device driver. For this case I was thinking of adding a
> multiple-consumer thing: both io-pgfaults and the device driver register
> a fault handler. io-pgfault handles recoverable ones and what's left
> goes to the device driver. This could also allow the device driver to be
> notified when io-pgfault doesn't successfully handle a fault
> (handle_mm_fault returns an error).
> 
> Thanks,
> Jean
> 


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-09-06 Thread Auger Eric
Hi Jean-Philippe,

On 09/06/2018 02:42 PM, Jean-Philippe Brucker wrote:
> On 06/09/2018 10:25, Auger Eric wrote:
>>> +   mutex_lock(>lock);
>>> +   list_add_tail(_pending->list, >faults);
>> same doubt as Yi Liu. You cannot rely on the userspace willingness to
>> void the queue and deallocate this memory.

By the way I saw there is a kind of garbage collectors for faults which
wouldn't have received any responses. However I am not sure this removes
the concern of having the fault list on kernel side growing beyond
acceptable limits.
>>
>> SMMUv3 holds a queue of events whose size is implementation dependent.
>> I think such a queue should be available at SW level and its size should
>> be negotiated.
> 
> Note that this fault API can also be used by host device drivers that
> want to be notified on fault, in which case a direct callback seems
> easier to use, and perhaps more efficient than an intermediate queue.

> 
> When injecting faults into userspace it makes sense to batch events, to
> avoid context switches. Even though that queue management should
> probably be done by VFIO, the IOMMU API has to help in some way, at
> least to tell VFIO when the IOMMU driver is done processing a batch of
> event.
Yes I am currently investigating the usage of a kfifo in
vfio_iommu_type1, filled by the direct callback.
> 
>> Note SMMU has separate queues for PRI and fault events. Here you use the
>> same queue for all events. I don't know if it would make sense to have
>> separate APIs?
> 
> Host device drivers that use this API to be notified on fault can't deal
> with arch-specific event formats (SMMU event, Vt-d fault event, etc), so
> the APIs should be arch-agnostic. Given that requirement, using a single
> iommu_fault_event structure for both PRI and event queues made sense,
> especially since the even queue can have stall events that look a lot
> like PRI page requests.
I understand the data structure needs to be generic. Now I guess PRI
events and other standard translator error events (that can happen
without PRI) may have different characteristics in event fields, queue
size, that may deserve to create different APIs and internal data
structs. Also this may help separating the concerns. My remark also
stems from the fact the SMMU uses 2 different queues, whose size can
also be different.

Thanks

Eric
> 
> Or do you mean separate APIs for recoverable and non-recoverable faults?
> Using the same queue for PRI and stall event, and a separate one for
> non-recoverable events?
> 
> Separate queues may be useful for the device driver scenario
> (non-virtualization), where recoverable faults are handled by
> io-pgfaults while non-recoverable ones could be reported directly to the
> device driver. For this case I was thinking of adding a
> multiple-consumer thing: both io-pgfaults and the device driver register
> a fault handler. io-pgfault handles recoverable ones and what's left
> goes to the device driver. This could also allow the device driver to be
> notified when io-pgfault doesn't successfully handle a fault
> (handle_mm_fault returns an error).
> 
> Thanks,
> Jean
> 


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-21 Thread Jacob Pan
On Thu, 17 May 2018 23:22:43 +
"Liu, Yi L"  wrote:

> > So as long as in-kernel PRQ handling can do queuing, there is no
> > need for queuing in the host reporting path.  
> 
> Will it affect current interface? Here the handler only get an "evt"
> per a PRQ IRQ. And I suppose vfio needs not rely on host iommu
> queuing?

I don't think it needs iommu driver queuing.


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-21 Thread Jacob Pan
On Thu, 17 May 2018 23:22:43 +
"Liu, Yi L"  wrote:

> > So as long as in-kernel PRQ handling can do queuing, there is no
> > need for queuing in the host reporting path.  
> 
> Will it affect current interface? Here the handler only get an "evt"
> per a PRQ IRQ. And I suppose vfio needs not rely on host iommu
> queuing?

I don't think it needs iommu driver queuing.


RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Thursday, May 17, 2018 11:59 PM
> On Thu, 17 May 2018 11:41:56 +
> "Liu, Yi L"  wrote:
> 
> > > +int iommu_report_device_fault(struct device *dev, struct
> > > +iommu_fault_event *evt) {
> > > + int ret = 0;
> > > + struct iommu_fault_event *evt_pending;
> > > + struct iommu_fault_param *fparam;
> > > +
> > > + /* iommu_param is allocated when device is added to group
> > > */
> > > + if (!dev->iommu_param | !evt)
> > > + return -EINVAL;
> > > + /* we only report device fault if there is a handler
> > > registered */
> > > + mutex_lock(>iommu_param->lock);
> > > + if (!dev->iommu_param->fault_param ||
> > > + !dev->iommu_param->fault_param->handler) {
> > > + ret = -EINVAL;
> > > + goto done_unlock;
> > > + }
> > > + fparam = dev->iommu_param->fault_param;
> > > + if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > > + evt_pending = kmemdup(evt, sizeof(struct
> > > iommu_fault_event),
> > > + GFP_KERNEL);
> > > + if (!evt_pending) {
> > > + ret = -ENOMEM;
> > > + goto done_unlock;
> > > + }
> > > + mutex_lock(>lock);
> > > + list_add_tail(_pending->list,
> > > >faults);
> >
> > I may missed it. Here only see list add, how about removing? Who would
> > remove entry from the fault list?
> >
> deletion of the pending event is in page response function (int
> iommu_page_response), once iommu driver finds a matching response for the
> pending request, it will delete the pending event.
> 
> if the response never came, right now we don't delete it, just gives warning.

Got it.

> 
> > > + mutex_unlock(>lock);
> > > + }
> > > + ret = fparam->handler(evt, fparam->data);
> >
> > I remember you mentioned there will be a queue to store the faults.
> > Is it in the fparam->faults list? Or there is no such queue?
> There are two use cases:
> case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
>   e.g. vfio
> case B: in-kernel
> 
> The io page fault queuing is Jean's patchset, mostly for case B (in-kernel IO 
> page
> fault handling). I will convert intel-svm to Jean's io page fault mechanism 
> so that we
> can also have parallel and out of order queuing of PRQ. I still need some 
> time to
> evaluate intel specific needs such as streaming page request/response.
> 
> For case A, there is no queuing in host IOMMU driver. My understanding of the 
> flow
> is as the following:
> 1. host IOMMU receives PRQ
> 2. host IOMMU driver reports PRQ fault event to registered called, i.e.
> vfio
> 3. VFIO reports fault event to QEMU
> 4. QEMU injects PRQ to guest
> 5. Guest IOMMU driver receives PRQ in IRQ 6. Guest IOMMU driver queue PRQ by
> groups, PASID.

Correct.

> So as long as in-kernel PRQ handling can do queuing, there is no need for 
> queuing in
> the host reporting path.

Will it affect current interface? Here the handler only get an "evt" per a PRQ 
IRQ. And I suppose
vfio needs not rely on host iommu queuing?

Thanks,
Yi Liu



RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Thursday, May 17, 2018 11:59 PM
> On Thu, 17 May 2018 11:41:56 +
> "Liu, Yi L"  wrote:
> 
> > > +int iommu_report_device_fault(struct device *dev, struct
> > > +iommu_fault_event *evt) {
> > > + int ret = 0;
> > > + struct iommu_fault_event *evt_pending;
> > > + struct iommu_fault_param *fparam;
> > > +
> > > + /* iommu_param is allocated when device is added to group
> > > */
> > > + if (!dev->iommu_param | !evt)
> > > + return -EINVAL;
> > > + /* we only report device fault if there is a handler
> > > registered */
> > > + mutex_lock(>iommu_param->lock);
> > > + if (!dev->iommu_param->fault_param ||
> > > + !dev->iommu_param->fault_param->handler) {
> > > + ret = -EINVAL;
> > > + goto done_unlock;
> > > + }
> > > + fparam = dev->iommu_param->fault_param;
> > > + if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > > + evt_pending = kmemdup(evt, sizeof(struct
> > > iommu_fault_event),
> > > + GFP_KERNEL);
> > > + if (!evt_pending) {
> > > + ret = -ENOMEM;
> > > + goto done_unlock;
> > > + }
> > > + mutex_lock(>lock);
> > > + list_add_tail(_pending->list,
> > > >faults);
> >
> > I may missed it. Here only see list add, how about removing? Who would
> > remove entry from the fault list?
> >
> deletion of the pending event is in page response function (int
> iommu_page_response), once iommu driver finds a matching response for the
> pending request, it will delete the pending event.
> 
> if the response never came, right now we don't delete it, just gives warning.

Got it.

> 
> > > + mutex_unlock(>lock);
> > > + }
> > > + ret = fparam->handler(evt, fparam->data);
> >
> > I remember you mentioned there will be a queue to store the faults.
> > Is it in the fparam->faults list? Or there is no such queue?
> There are two use cases:
> case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
>   e.g. vfio
> case B: in-kernel
> 
> The io page fault queuing is Jean's patchset, mostly for case B (in-kernel IO 
> page
> fault handling). I will convert intel-svm to Jean's io page fault mechanism 
> so that we
> can also have parallel and out of order queuing of PRQ. I still need some 
> time to
> evaluate intel specific needs such as streaming page request/response.
> 
> For case A, there is no queuing in host IOMMU driver. My understanding of the 
> flow
> is as the following:
> 1. host IOMMU receives PRQ
> 2. host IOMMU driver reports PRQ fault event to registered called, i.e.
> vfio
> 3. VFIO reports fault event to QEMU
> 4. QEMU injects PRQ to guest
> 5. Guest IOMMU driver receives PRQ in IRQ 6. Guest IOMMU driver queue PRQ by
> groups, PASID.

Correct.

> So as long as in-kernel PRQ handling can do queuing, there is no need for 
> queuing in
> the host reporting path.

Will it affect current interface? Here the handler only get an "evt" per a PRQ 
IRQ. And I suppose
vfio needs not rely on host iommu queuing?

Thanks,
Yi Liu



Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Jacob Pan
On Thu, 17 May 2018 11:41:56 +
"Liu, Yi L"  wrote:

> > +int iommu_report_device_fault(struct device *dev, struct
> > +iommu_fault_event *evt) {
> > +   int ret = 0;
> > +   struct iommu_fault_event *evt_pending;
> > +   struct iommu_fault_param *fparam;
> > +
> > +   /* iommu_param is allocated when device is added to group
> > */
> > +   if (!dev->iommu_param | !evt)
> > +   return -EINVAL;
> > +   /* we only report device fault if there is a handler
> > registered */
> > +   mutex_lock(>iommu_param->lock);
> > +   if (!dev->iommu_param->fault_param ||
> > +   !dev->iommu_param->fault_param->handler) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +   fparam = dev->iommu_param->fault_param;
> > +   if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > +   evt_pending = kmemdup(evt, sizeof(struct
> > iommu_fault_event),
> > +   GFP_KERNEL);
> > +   if (!evt_pending) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +   mutex_lock(>lock);
> > +   list_add_tail(_pending->list,
> > >faults);  
> 
> I may missed it. Here only see list add, how about removing? Who
> would remove entry from the fault list?
> 
deletion of the pending event is in page response function (int
iommu_page_response), once iommu driver finds a matching response for
the pending request, it will delete the pending event.

if the response never came, right now we don't delete it, just gives
warning.

> > +   mutex_unlock(>lock);
> > +   }
> > +   ret = fparam->handler(evt, fparam->data);  
> 
> I remember you mentioned there will be a queue to store the faults.
> Is it in the fparam->faults list? Or there is no such queue?
There are two use cases:
case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
e.g. vfio
case B: in-kernel

The io page fault queuing is Jean's patchset, mostly for case
B (in-kernel IO page fault handling). I will convert intel-svm to Jean's
io page fault mechanism so that we can also have parallel and out of
order queuing of PRQ. I still need some time to evaluate intel specific
needs such as streaming page request/response.

For case A, there is no queuing in host IOMMU driver. My understanding
of the flow is as the following:
1. host IOMMU receives PRQ
2. host IOMMU driver reports PRQ fault event to registered called, i.e.
vfio
3. VFIO reports fault event to QEMU
4. QEMU injects PRQ to guest
5. Guest IOMMU driver receives PRQ in IRQ
6. Guest IOMMU driver queue PRQ by groups, PASID.
So as long as in-kernel PRQ handling can do queuing, there is no need
for queuing in the host reporting path.

Jacob


Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Jacob Pan
On Thu, 17 May 2018 11:41:56 +
"Liu, Yi L"  wrote:

> > +int iommu_report_device_fault(struct device *dev, struct
> > +iommu_fault_event *evt) {
> > +   int ret = 0;
> > +   struct iommu_fault_event *evt_pending;
> > +   struct iommu_fault_param *fparam;
> > +
> > +   /* iommu_param is allocated when device is added to group
> > */
> > +   if (!dev->iommu_param | !evt)
> > +   return -EINVAL;
> > +   /* we only report device fault if there is a handler
> > registered */
> > +   mutex_lock(>iommu_param->lock);
> > +   if (!dev->iommu_param->fault_param ||
> > +   !dev->iommu_param->fault_param->handler) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +   fparam = dev->iommu_param->fault_param;
> > +   if (evt->type == IOMMU_FAULT_PAGE_REQ && evt->last_req) {
> > +   evt_pending = kmemdup(evt, sizeof(struct
> > iommu_fault_event),
> > +   GFP_KERNEL);
> > +   if (!evt_pending) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +   mutex_lock(>lock);
> > +   list_add_tail(_pending->list,
> > >faults);  
> 
> I may missed it. Here only see list add, how about removing? Who
> would remove entry from the fault list?
> 
deletion of the pending event is in page response function (int
iommu_page_response), once iommu driver finds a matching response for
the pending request, it will delete the pending event.

if the response never came, right now we don't delete it, just gives
warning.

> > +   mutex_unlock(>lock);
> > +   }
> > +   ret = fparam->handler(evt, fparam->data);  
> 
> I remember you mentioned there will be a queue to store the faults.
> Is it in the fparam->faults list? Or there is no such queue?
There are two use cases:
case A: guest SVA, PRQ events are reported outside IOMMU subsystem,
e.g. vfio
case B: in-kernel

The io page fault queuing is Jean's patchset, mostly for case
B (in-kernel IO page fault handling). I will convert intel-svm to Jean's
io page fault mechanism so that we can also have parallel and out of
order queuing of PRQ. I still need some time to evaluate intel specific
needs such as streaming page request/response.

For case A, there is no queuing in host IOMMU driver. My understanding
of the flow is as the following:
1. host IOMMU receives PRQ
2. host IOMMU driver reports PRQ fault event to registered called, i.e.
vfio
3. VFIO reports fault event to QEMU
4. QEMU injects PRQ to guest
5. Guest IOMMU driver receives PRQ in IRQ
6. Guest IOMMU driver queue PRQ by groups, PASID.
So as long as in-kernel PRQ handling can do queuing, there is no need
for queuing in the host reporting path.

Jacob


RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Saturday, May 12, 2018 4:54 AM
> 
> Traditionally, device specific faults are detected and handled within their 
> own device
> drivers. When IOMMU is enabled, faults such as DMA related transactions are
> detected by IOMMU. There is no generic reporting mechanism to report faults 
> back
> to the in-kernel device driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which can be
> reported at per device basis, regardless of the device type is a PCI device 
> or not.
> 
> The fault types include recoverable (e.g. page request) and unrecoverable 
> faults(e.g.
> access error). In most cases, faults can be handled by IOMMU drivers 
> internally. The
> primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is assigned 
> to
> guest via vIOMMU. In this case, the first level page tables are owned by the 
> guest.
> Page request must be propagated to the guest to let guest OS fault in the 
> pages then
> send page response. In this mechanism, the direct receiver of IOMMU fault
> notification is VFIO, which can relay notification events to QEMU or other 
> user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than simply 
> invoke reset
> function, there are needs to let device driver handle the fault with a 
> smaller impact.
> 
> This patchset is intended to create a generic fault report API such that it 
> can scale as
> follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions summarized
> at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   goto err_free_name;
>   }
> 
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
> 
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
> 
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group,  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> 
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault
> +handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the
> +fault event
> + * and data as argument. The handler should return 0 on success. If the
> +fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> +

RE: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-17 Thread Liu, Yi L
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Saturday, May 12, 2018 4:54 AM
> 
> Traditionally, device specific faults are detected and handled within their 
> own device
> drivers. When IOMMU is enabled, faults such as DMA related transactions are
> detected by IOMMU. There is no generic reporting mechanism to report faults 
> back
> to the in-kernel device driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which can be
> reported at per device basis, regardless of the device type is a PCI device 
> or not.
> 
> The fault types include recoverable (e.g. page request) and unrecoverable 
> faults(e.g.
> access error). In most cases, faults can be handled by IOMMU drivers 
> internally. The
> primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is assigned 
> to
> guest via vIOMMU. In this case, the first level page tables are owned by the 
> guest.
> Page request must be propagated to the guest to let guest OS fault in the 
> pages then
> send page response. In this mechanism, the direct receiver of IOMMU fault
> notification is VFIO, which can relay notification events to QEMU or other 
> user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than simply 
> invoke reset
> function, there are needs to let device driver handle the fault with a 
> smaller impact.
> 
> This patchset is intended to create a generic fault report API such that it 
> can scale as
> follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions summarized
> at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   goto err_free_name;
>   }
> 
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
> 
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group,
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
> 
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group,  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> 
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault
> +handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the
> +fault event
> + * and data as argument. The handler should return 0 on success. If the
> +fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-15 Thread Lu Baolu
Hi,

On 05/15/2018 04:55 AM, Jacob Pan wrote:
> On Mon, 14 May 2018 14:01:06 +0800
> Lu Baolu  wrote:
>
>> Hi,
>>
>> On 05/12/2018 04:54 AM, Jacob Pan wrote:
>>> Traditionally, device specific faults are detected and handled
>>> within their own device drivers. When IOMMU is enabled, faults such
>>> as DMA related transactions are detected by IOMMU. There is no
>>> generic reporting mechanism to report faults back to the in-kernel
>>> device driver or the guest OS in case of assigned devices.
>>>
>>> Faults detected by IOMMU is based on the transaction's source ID
>>> which can be reported at per device basis, regardless of the device
>>> type is a PCI device or not.
>>>
>>> The fault types include recoverable (e.g. page request) and
>>> unrecoverable faults(e.g. access error). In most cases, faults can
>>> be handled by IOMMU drivers internally. The primary use cases are as
>>> follows:
>>> 1. page request fault originated from an SVM capable device that is
>>> assigned to guest via vIOMMU. In this case, the first level page
>>> tables are owned by the guest. Page request must be propagated to
>>> the guest to let guest OS fault in the pages then send page
>>> response. In this mechanism, the direct receiver of IOMMU fault
>>> notification is VFIO, which can relay notification events to QEMU
>>> or other user space software.
>>>
>>> 2. faults need more subtle handling by device drivers. Other than
>>> simply invoke reset function, there are needs to let device driver
>>> handle the fault with a smaller impact.
>>>
>>> This patchset is intended to create a generic fault report API such
>>> that it can scale as follows:
>>> - all IOMMU types
>>> - PCI and non-PCI devices
>>> - recoverable and unrecoverable faults
>>> - VFIO and other other in kernel users
>>> - DMA & IRQ remapping (TBD)
>>> The original idea was brought up by David Woodhouse and discussions
>>> summarized at https://lwn.net/Articles/608914/.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Jean-Philippe Brucker 
>>> ---
>>>  drivers/iommu/iommu.c | 149
>>> +-
>>> include/linux/iommu.h |  35 +++- 2 files changed, 181
>>> insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 3a49b96..b3f9daf 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) goto err_free_name;
>>> }
>>>  
>>> +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
>>> GFP_KERNEL);
>>> +   if (!dev->iommu_param) {
>>> +   ret = -ENOMEM;
>>> +   goto err_free_name;
>>> +   }
>>> +   mutex_init(>iommu_param->lock);
>>> +
>>> kobject_get(group->devices_kobj);
>>>  
>>> dev->iommu_group = group;
>>> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) mutex_unlock(>mutex);
>>> dev->iommu_group = NULL;
>>> kobject_put(group->devices_kobj);
>>> +   kfree(dev->iommu_param);
>>>  err_free_name:
>>> kfree(device->name);
>>>  err_remove_link:
>>> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
>>> *dev) sysfs_remove_link(>kobj, "iommu_group");
>>>  
>>> trace_remove_device_from_group(group->id, dev);
>>> -
>>> +   kfree(dev->iommu_param);
>>> kfree(device->name);
>>> kfree(device);
>>> dev->iommu_group = NULL;
>>> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
>>> iommu_group *group,
>>> EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
>>>  /**
>>> + * iommu_register_device_fault_handler() - Register a device fault
>>> handler
>>> + * @dev: the device
>>> + * @handler: the fault handler
>>> + * @data: private data passed as argument to the handler
>>> + *
>>> + * When an IOMMU fault event is received, call this handler with
>>> the fault event
>>> + * and data as argument. The handler should return 0 on success.
>>> If the fault is
>>> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
>>> complete
>>> + * the fault by calling iommu_page_response() with one of the
>>> following
>>> + * response code:
>>> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
>>> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
>>> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
>>> reporting
>>> + *   page faults if possible.
>>> + *
>>> + * Return 0 if the fault handler was installed successfully, or an
>>> error.
>>> + */
>>> +int iommu_register_device_fault_handler(struct device *dev,
>>> +   iommu_dev_fault_handler_t
>>> handler,
>>> +   void *data)
>>> +{
>>> +   struct iommu_param *param = dev->iommu_param;
>>> +   int ret = 0;
>>> +
>>> +   /*
>>> +* Device iommu_param should have been allocated 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-15 Thread Lu Baolu
Hi,

On 05/15/2018 04:55 AM, Jacob Pan wrote:
> On Mon, 14 May 2018 14:01:06 +0800
> Lu Baolu  wrote:
>
>> Hi,
>>
>> On 05/12/2018 04:54 AM, Jacob Pan wrote:
>>> Traditionally, device specific faults are detected and handled
>>> within their own device drivers. When IOMMU is enabled, faults such
>>> as DMA related transactions are detected by IOMMU. There is no
>>> generic reporting mechanism to report faults back to the in-kernel
>>> device driver or the guest OS in case of assigned devices.
>>>
>>> Faults detected by IOMMU is based on the transaction's source ID
>>> which can be reported at per device basis, regardless of the device
>>> type is a PCI device or not.
>>>
>>> The fault types include recoverable (e.g. page request) and
>>> unrecoverable faults(e.g. access error). In most cases, faults can
>>> be handled by IOMMU drivers internally. The primary use cases are as
>>> follows:
>>> 1. page request fault originated from an SVM capable device that is
>>> assigned to guest via vIOMMU. In this case, the first level page
>>> tables are owned by the guest. Page request must be propagated to
>>> the guest to let guest OS fault in the pages then send page
>>> response. In this mechanism, the direct receiver of IOMMU fault
>>> notification is VFIO, which can relay notification events to QEMU
>>> or other user space software.
>>>
>>> 2. faults need more subtle handling by device drivers. Other than
>>> simply invoke reset function, there are needs to let device driver
>>> handle the fault with a smaller impact.
>>>
>>> This patchset is intended to create a generic fault report API such
>>> that it can scale as follows:
>>> - all IOMMU types
>>> - PCI and non-PCI devices
>>> - recoverable and unrecoverable faults
>>> - VFIO and other other in kernel users
>>> - DMA & IRQ remapping (TBD)
>>> The original idea was brought up by David Woodhouse and discussions
>>> summarized at https://lwn.net/Articles/608914/.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Jean-Philippe Brucker 
>>> ---
>>>  drivers/iommu/iommu.c | 149
>>> +-
>>> include/linux/iommu.h |  35 +++- 2 files changed, 181
>>> insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 3a49b96..b3f9daf 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) goto err_free_name;
>>> }
>>>  
>>> +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
>>> GFP_KERNEL);
>>> +   if (!dev->iommu_param) {
>>> +   ret = -ENOMEM;
>>> +   goto err_free_name;
>>> +   }
>>> +   mutex_init(>iommu_param->lock);
>>> +
>>> kobject_get(group->devices_kobj);
>>>  
>>> dev->iommu_group = group;
>>> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
>>> *group, struct device *dev) mutex_unlock(>mutex);
>>> dev->iommu_group = NULL;
>>> kobject_put(group->devices_kobj);
>>> +   kfree(dev->iommu_param);
>>>  err_free_name:
>>> kfree(device->name);
>>>  err_remove_link:
>>> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
>>> *dev) sysfs_remove_link(>kobj, "iommu_group");
>>>  
>>> trace_remove_device_from_group(group->id, dev);
>>> -
>>> +   kfree(dev->iommu_param);
>>> kfree(device->name);
>>> kfree(device);
>>> dev->iommu_group = NULL;
>>> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
>>> iommu_group *group,
>>> EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
>>>  /**
>>> + * iommu_register_device_fault_handler() - Register a device fault
>>> handler
>>> + * @dev: the device
>>> + * @handler: the fault handler
>>> + * @data: private data passed as argument to the handler
>>> + *
>>> + * When an IOMMU fault event is received, call this handler with
>>> the fault event
>>> + * and data as argument. The handler should return 0 on success.
>>> If the fault is
>>> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
>>> complete
>>> + * the fault by calling iommu_page_response() with one of the
>>> following
>>> + * response code:
>>> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
>>> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
>>> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
>>> reporting
>>> + *   page faults if possible.
>>> + *
>>> + * Return 0 if the fault handler was installed successfully, or an
>>> error.
>>> + */
>>> +int iommu_register_device_fault_handler(struct device *dev,
>>> +   iommu_dev_fault_handler_t
>>> handler,
>>> +   void *data)
>>> +{
>>> +   struct iommu_param *param = dev->iommu_param;
>>> +   int ret = 0;
>>> +
>>> +   /*
>>> +* Device iommu_param should have been allocated when
>>> device is
>>> +* added to its iommu_group.
>>> +*/
>>> +   if (!param)
>>> +   return 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 14:01:06 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> >
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> >
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> >
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> >
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> >
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 149
> > +-
> > include/linux/iommu.h |  35 +++- 2 files changed, 181
> > insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 3a49b96..b3f9daf 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) goto err_free_name;
> > }
> >  
> > +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_param) {
> > +   ret = -ENOMEM;
> > +   goto err_free_name;
> > +   }
> > +   mutex_init(>iommu_param->lock);
> > +
> > kobject_get(group->devices_kobj);
> >  
> > dev->iommu_group = group;
> > @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) mutex_unlock(>mutex);
> > dev->iommu_group = NULL;
> > kobject_put(group->devices_kobj);
> > +   kfree(dev->iommu_param);
> >  err_free_name:
> > kfree(device->name);
> >  err_remove_link:
> > @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
> > *dev) sysfs_remove_link(>kobj, "iommu_group");
> >  
> > trace_remove_device_from_group(group->id, dev);
> > -
> > +   kfree(dev->iommu_param);
> > kfree(device->name);
> > kfree(device);
> > dev->iommu_group = NULL;
> > @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group,
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> >  /**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler,
> > +   void *data)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   /*
> > +* Device iommu_param should have been allocated when
> > device is
> > +* added to its iommu_group.

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Jacob Pan
On Mon, 14 May 2018 14:01:06 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 05/12/2018 04:54 AM, Jacob Pan wrote:
> > Traditionally, device specific faults are detected and handled
> > within their own device drivers. When IOMMU is enabled, faults such
> > as DMA related transactions are detected by IOMMU. There is no
> > generic reporting mechanism to report faults back to the in-kernel
> > device driver or the guest OS in case of assigned devices.
> >
> > Faults detected by IOMMU is based on the transaction's source ID
> > which can be reported at per device basis, regardless of the device
> > type is a PCI device or not.
> >
> > The fault types include recoverable (e.g. page request) and
> > unrecoverable faults(e.g. access error). In most cases, faults can
> > be handled by IOMMU drivers internally. The primary use cases are as
> > follows:
> > 1. page request fault originated from an SVM capable device that is
> > assigned to guest via vIOMMU. In this case, the first level page
> > tables are owned by the guest. Page request must be propagated to
> > the guest to let guest OS fault in the pages then send page
> > response. In this mechanism, the direct receiver of IOMMU fault
> > notification is VFIO, which can relay notification events to QEMU
> > or other user space software.
> >
> > 2. faults need more subtle handling by device drivers. Other than
> > simply invoke reset function, there are needs to let device driver
> > handle the fault with a smaller impact.
> >
> > This patchset is intended to create a generic fault report API such
> > that it can scale as follows:
> > - all IOMMU types
> > - PCI and non-PCI devices
> > - recoverable and unrecoverable faults
> > - VFIO and other other in kernel users
> > - DMA & IRQ remapping (TBD)
> > The original idea was brought up by David Woodhouse and discussions
> > summarized at https://lwn.net/Articles/608914/.
> >
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 149
> > +-
> > include/linux/iommu.h |  35 +++- 2 files changed, 181
> > insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 3a49b96..b3f9daf 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) goto err_free_name;
> > }
> >  
> > +   dev->iommu_param = kzalloc(sizeof(*dev->iommu_param),
> > GFP_KERNEL);
> > +   if (!dev->iommu_param) {
> > +   ret = -ENOMEM;
> > +   goto err_free_name;
> > +   }
> > +   mutex_init(>iommu_param->lock);
> > +
> > kobject_get(group->devices_kobj);
> >  
> > dev->iommu_group = group;
> > @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group
> > *group, struct device *dev) mutex_unlock(>mutex);
> > dev->iommu_group = NULL;
> > kobject_put(group->devices_kobj);
> > +   kfree(dev->iommu_param);
> >  err_free_name:
> > kfree(device->name);
> >  err_remove_link:
> > @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device
> > *dev) sysfs_remove_link(>kobj, "iommu_group");
> >  
> > trace_remove_device_from_group(group->id, dev);
> > -
> > +   kfree(dev->iommu_param);
> > kfree(device->name);
> > kfree(device);
> > dev->iommu_group = NULL;
> > @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct
> > iommu_group *group,
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); 
> >  /**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */
> > +int iommu_register_device_fault_handler(struct device *dev,
> > +   iommu_dev_fault_handler_t
> > handler,
> > +   void *data)
> > +{
> > +   struct iommu_param *param = dev->iommu_param;
> > +   int ret = 0;
> > +
> > +   /*
> > +* Device iommu_param should have been allocated when
> > device is
> > +* added to its iommu_group.
> > +*/
> > +   if (!param)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>lock);
> > +   /* 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Lu Baolu
Hi,

On 05/12/2018 04:54 AM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
>
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
>
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
>
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
>
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto 

Re: [PATCH v5 13/23] iommu: introduce device fault report API

2018-05-14 Thread Lu Baolu
Hi,

On 05/12/2018 04:54 AM, Jacob Pan wrote:
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
>
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
>
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
>
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
>
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
>
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 149 
> +-
>  include/linux/iommu.h |  35 +++-
>  2 files changed, 181 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3a49b96..b3f9daf 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -609,6 +609,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -639,6 +646,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -685,7 +693,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -820,6 +828,145 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + param->fault_param =
> +