Re: [PATCH v3 15/16] iommu: introduce page response function

2017-12-07 Thread Jacob Pan
On Thu, 7 Dec 2017 12:56:55 +
Jean-Philippe Brucker  wrote:

> On 06/12/17 19:25, Jacob Pan wrote:
> [...]
> >>   For SMMUv3, the stall buffer may be shared between devices on
> >> some implementations, in which case the guest could prevent other
> >> devices to stall by letting the buffer fill up.  
> >>   -> We might have to keep track of stalls in the host driver and
> >> set a credit or timeout to each stall, if it comes to that.  
> >>   -> In addition, send a terminate-all-stalls command when
> >> changing the device's domain.
> >>  
> > We have the same situation in VT-d with shared queue which in turn
> > may affect other guests. Letting host driver maintain record of
> > pending page request seems the best way to go. VT-d has a way to
> > drain PRQ per PASID and RID combination. I guess this is the same
> > as your "terminate-all-stalls" but with finer control? Or
> > "terminate-all-stalls" only applies to a given device.  
> 
> That command terminates all stalls for a given device (for all
> PASIDs). It's a bit awkward to implement but should be enough to
> ensure that we don't leak any outstanding faults to the next VM.
> 
OK, in any case, I think this terminate request should come from the
drivers or vfio not initiated by IOMMU.
> > Seems we can implement a generic timeout/credit mechanism in IOMMU
> > driver with model specific action to drain/terminate. The timeout
> > value can also be model specific.  
> 
> Sounds good. Timeout seems a bit complicated to implement (and how do
> we guess what timeout would work?), so maybe it's simpler to enforce
> a quota of outstanding faults per VM, for example half of the shared
> queue size (the number can be chosen by the IOMMU driver). If a VM
> has that many outstanding faults, then any new fault is immediately
> terminated by the host. A bit rough but it might be enough to
> mitigate the problem initially, and we can always tweak it later (for
> instance disable faulting if a guest doesn't ever reply).
> 
I have to make a correction/clarification, even though vt-d has a per
iommu shared queue for prq but we do not stall. Ashok reminded me that.
So there is no constraint on IOMMU if one of the guests does not
respond. All the pressure is on the device which may have limited # of
pending PR.

> Seems like VFIO should enforce this quota, since the IOMMU layer
> doesn't know which device is assigned to which VM. If it's the IOMMU
> that enforces quotas per device and a VM has 15 devices assigned,
> then the guest can still DoS the IOMMU.
> 
I still think timeout makes more sense than quota in that a VM could
be under quota but failed to respond to one of the devices forever.
I agree it is hard to devise a good timeout limit but since this is to
prevent rare faults, we could pick a relatively large timeout. And we
only tracks the longest pending timeout per device. The error condition
we try to prevent is not necessarily only stall buffer overflow but
timeout also, right?
> [...]
> >>> + * @type: group or stream response
> >>
> >> The page request doesn't provide this information
> >>  
> > this is vt-d specific. it is in the vt-d page request descriptor and
> > response descriptors are different depending on the type.
> > Since we intend the generic data to be super set of models, I add
> > this field.  
> 
> But don't you need to add the stream type to enum iommu_fault_type, in
> patch 8? Otherwise the guest can't know what type to set in the
> response.
> 
> Thanks,
> Jean
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 15/16] iommu: introduce page response function

2017-12-07 Thread Alex Williamson
On Thu, 7 Dec 2017 12:56:55 +
Jean-Philippe Brucker  wrote:

> On 06/12/17 19:25, Jacob Pan wrote:
> [...]
> >>   For SMMUv3, the stall buffer may be shared between devices on some
> >>   implementations, in which case the guest could prevent other
> >> devices to stall by letting the buffer fill up.  
> >>   -> We might have to keep track of stalls in the host driver and set  
> >> a credit or timeout to each stall, if it comes to that.  
> >>   -> In addition, send a terminate-all-stalls command when changing  
> >> the device's domain.
> >>  
> > We have the same situation in VT-d with shared queue which in turn may
> > affect other guests. Letting host driver maintain record of pending page
> > request seems the best way to go. VT-d has a way to drain PRQ per PASID
> > and RID combination. I guess this is the same as your
> > "terminate-all-stalls" but with finer control? Or
> > "terminate-all-stalls" only applies to a given device.  
> 
> That command terminates all stalls for a given device (for all PASIDs).
> It's a bit awkward to implement but should be enough to ensure that we
> don't leak any outstanding faults to the next VM.
> 
> > Seems we can implement a generic timeout/credit mechanism in IOMMU
> > driver with model specific action to drain/terminate. The timeout value
> > can also be model specific.  
> 
> Sounds good. Timeout seems a bit complicated to implement (and how do we
> guess what timeout would work?), so maybe it's simpler to enforce a quota
> of outstanding faults per VM, for example half of the shared queue size
> (the number can be chosen by the IOMMU driver). If a VM has that many
> outstanding faults, then any new fault is immediately terminated by the
> host. A bit rough but it might be enough to mitigate the problem
> initially, and we can always tweak it later (for instance disable faulting
> if a guest doesn't ever reply).
> 
> Seems like VFIO should enforce this quota, since the IOMMU layer doesn't
> know which device is assigned to which VM. If it's the IOMMU that enforces
> quotas per device and a VM has 15 devices assigned, then the guest can
> still DoS the IOMMU.

VFIO also doesn't know about VMs.  We know that devices attached to the
same container are probably used by the same user, but once we add
viommu, each device(group) uses its own container and we have no idea
they're associated.  So, no to VM based accounting, and it seems like
an IOMMU problem, X number of outstanding requests per device.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 15/16] iommu: introduce page response function

2017-12-07 Thread Alex Williamson
On Wed, 6 Dec 2017 11:25:21 -0800
Jacob Pan  wrote:

> On Tue, 5 Dec 2017 17:21:15 +
> Jean-Philippe Brucker  wrote:
> 
> > Hi Jacob,
> > 
> > On 04/12/17 21:37, Jacob Pan wrote:  
> > > On Fri, 24 Nov 2017 12:03:50 +
> > > Jean-Philippe Brucker  wrote:
> > > 
> > >> On 17/11/17 18:55, Jacob Pan wrote:
> > >>> When nested translation is turned on and guest owns the
> > >>> first level page tables, device page request can be forwared
> > >>> to the guest for handling faults. As the page response returns
> > >>> by the guest, IOMMU driver on the host need to process the
> > >>> response which informs the device and completes the page request
> > >>> transaction.
> > >>>
> > >>> This patch introduces generic API function for page response
> > >>> passing from the guest or other in-kernel users. The definitions
> > >>> of the generic data is based on PCI ATS specification not limited
> > >>> to any vendor.>
> > >>> Signed-off-by: Jacob Pan 
> > [...]  
> > > I think the simpler interface works for in-kernel driver use case
> > > very well. But in case of VFIO, the callback function does not turn
> > > around send back page response. The page response comes from guest
> > > and qemu, where they don;t keep track of the the prq event data.
> > 
> > Is it safe to trust whatever response the guest or userspace gives
> > us? The answer seems fairly vendor- and device-specific so I wonder
> > if VFIO or IOMMU shouldn't do a bit of sanity checking somewhere, and
> > keep track of all injected page requests.

This is always my question when we start embedding IDs in structures.
> > 
> > From SMMUv3 POV, it seems safe (haven't looked at SMMUv2 but I'm not
> > so confident).
> > 
> > * The guest can only send page responses to devices assigned to it,
> > that's a given.
> >   
> Agree, IOMMU driver cannot enforce it. I think VFIO layer can make sure
> page response come from the assigned device and its guest/container.

Can we enforce it via the IOMMU/VFIO interface?  If the response is for
a struct device, and not an rid/did embedded in a structure, then vfio
can pass it through w/o worrying about it, ie. response comes in via
ioctl with association to vfio device fd -> struct vfio_device -> struct
device, iommu driver fills in rid/did.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 10/16] iommu: introduce device fault report API

2017-12-07 Thread Alex Williamson
On Fri, 17 Nov 2017 10:55:08 -0800
Jacob Pan  wrote:

> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> Faults detected by IOMMU is based on the transaction's source ID which
> can be reported at per device basis, regardless of the device type is a
> PCI device or not.
> 
> The fault types include recoverable (e.g. page request) and
> unrecoverable faults(e.g. access error). In most cases, faults can be
> handled by IOMMU drivers internally. The primary use cases are as
> follows:
> 1. page request fault originated from an SVM capable device that is
> assigned to guest via vIOMMU. In this case, the first level page tables
> are owned by the guest. Page request must be propagated to the guest to
> let guest OS fault in the pages then send page response. In this
> mechanism, the direct receiver of IOMMU fault notification is VFIO,
> which can relay notification events to QEMU or other user space
> software.
> 
> 2. faults need more subtle handling by device drivers. Other than
> simply invoke reset function, there are needs to let device driver
> handle the fault with a smaller impact.
> 
> This patchset is intended to create a generic fault report API such
> that it can scale as follows:
> - all IOMMU types
> - PCI and non-PCI devices
> - recoverable and unrecoverable faults
> - VFIO and other other in kernel users
> - DMA & IRQ remapping (TBD)
> The original idea was brought up by David Woodhouse and discussions
> summarized at https://lwn.net/Articles/608914/.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/iommu.c | 63 
> ++-
>  include/linux/iommu.h | 36 +
>  2 files changed, 98 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 829e9e9..97b7990 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -581,6 +581,12 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(struct iommu_fault_param), 
> GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -657,7 +663,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -791,6 +797,61 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *idata = dev->iommu_param;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!idata)
> + return -EINVAL;
> + /* Only allow one fault handler registered for each device */
> + if (idata->fault_param)
> + return -EBUSY;
> + get_device(dev);
> + idata->fault_param =
> + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
> + if (!idata->fault_param)
> + return -ENOMEM;
> + idata->fault_param->handler = handler;
> + idata->fault_param->data = data;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> +
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> + struct iommu_param *idata = dev->iommu_param;
> +
> + if (!idata)
> + return -EINVAL;
> +
> + kfree(idata->fault_param);
> + idata->fault_param = NULL;
> + put_device(dev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> +
> +
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
> *evt)
> +{
> + /* we only report device fault if there is a handler registered */
> + if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> + !dev->iommu_param->fault_param->handler)
> + return -ENOSYS;
> +
> + return dev->iommu_param->fault_param->handler(evt,
> + 
> dev->iommu_param->fault_param->data);
> +}
> 

Re: [PATCH v3 3/6] iommu/vt-d: Add Intel IOMMU debugfs to show extended context internals

2017-12-07 Thread Mehta, Sohil
On Wed, 2017-12-06 at 16:17 +0800, Lu Baolu wrote:
> Hi,
> 
> On 12/06/2017 11:43 AM, Sohil Mehta wrote:
> > 
> > From: Gayatri Kammela 
> > 
> > +
> > +   if (new_ext) {
> > +   seq_printf(m, "Higher Context tbl entries for Bus:
> > %d\n", bus);
> > +   ctx_lo = context[0].lo;
> > +
> > +   if (!(ctx_lo & CONTEXT_PASIDE)) {
> > +   context[1].hi = (u64)virt_to_phys(
> > +   iommu->pasid_state_table);
> > +   context[1].lo = (u64)virt_to_phys(iommu-
> > >pasid_table) |
> > +   intel_iommu_get_pts(iommu)
> > ;
> Why do you change the context entries here?
> 

Thanks for catching this. The context entires were mistakenly getting
changed. We'll remove it.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/6] iommu/vt-d: Add Intel IOMMU debugfs to show context internals

2017-12-07 Thread Mehta, Sohil
On Wed, 2017-12-06 at 16:16 +0800, Lu Baolu wrote:
> Hi,
> 
> On 12/06/2017 11:43 AM, Sohil Mehta wrote:
> > 
> > From: Gayatri Kammela 
> > 
> > 
> > +   seq_printf(m, "%s Context table entries for Bus: %d\n",
> > +      ext ? "Lower" : "", bus);
> > +   seq_printf(m, "[entry]\tDID :B :D .F\tLow\t\tHigh\n");
> WARNING: Prefer seq_puts to seq_printf
> #119: FILE: drivers/iommu/intel-iommu-debug.c:59:
> +seq_printf(m, "[entry]\tDID :B :D .F\tLow\t\tHigh\n");
> 
> (caught by checkpatch.pl)
> 

Hi Lu,

We'll fix this and the other checkpatch.pl warnings.


> > +
> > +static void root_tbl_entry_show(struct seq_file *m, void *unused,
> Why do you define the "unused" parameter which will never been used?
> The same questions to other show functions.
> 

Some functions in our code that are registered with seq_file needed to
have an unused parameter since seq_file.h defines the show function as:
int (*show) (struct seq_file *m, void *v);

But a lot of other functions including the one you pointed don't need
to have the unused parameter. We'll remove it from those.


> > +void __init intel_iommu_debugfs_init(void)
> > +{
> > +   struct dentry *iommu_debug_root;
> > +
> > +   iommu_debug_root = debugfs_create_dir("intel_iommu",
> > NULL);
> > +
> > +   if (!iommu_debug_root) {
> > +   pr_err("can't create debugfs dir\n");
> I don't think we need a pr_err() here. System works well even
> debugfs_create_dir() returns NULL.
> 
> This is same to all pr_err() in this file.
> 

Would the recommendation be to use pr_warn instead of pr_err or should
we entirely skip the message altogether?

Thanks,
Sohil
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 15/16] iommu: introduce page response function

2017-12-07 Thread Jean-Philippe Brucker
On 06/12/17 19:25, Jacob Pan wrote:
[...]
>>   For SMMUv3, the stall buffer may be shared between devices on some
>>   implementations, in which case the guest could prevent other
>> devices to stall by letting the buffer fill up.
>>   -> We might have to keep track of stalls in the host driver and set
>> a credit or timeout to each stall, if it comes to that.
>>   -> In addition, send a terminate-all-stalls command when changing
>> the device's domain.
>>
> We have the same situation in VT-d with shared queue which in turn may
> affect other guests. Letting host driver maintain record of pending page
> request seems the best way to go. VT-d has a way to drain PRQ per PASID
> and RID combination. I guess this is the same as your
> "terminate-all-stalls" but with finer control? Or
> "terminate-all-stalls" only applies to a given device.

That command terminates all stalls for a given device (for all PASIDs).
It's a bit awkward to implement but should be enough to ensure that we
don't leak any outstanding faults to the next VM.

> Seems we can implement a generic timeout/credit mechanism in IOMMU
> driver with model specific action to drain/terminate. The timeout value
> can also be model specific.

Sounds good. Timeout seems a bit complicated to implement (and how do we
guess what timeout would work?), so maybe it's simpler to enforce a quota
of outstanding faults per VM, for example half of the shared queue size
(the number can be chosen by the IOMMU driver). If a VM has that many
outstanding faults, then any new fault is immediately terminated by the
host. A bit rough but it might be enough to mitigate the problem
initially, and we can always tweak it later (for instance disable faulting
if a guest doesn't ever reply).

Seems like VFIO should enforce this quota, since the IOMMU layer doesn't
know which device is assigned to which VM. If it's the IOMMU that enforces
quotas per device and a VM has 15 devices assigned, then the guest can
still DoS the IOMMU.

[...]
>>> + * @type: group or stream response  
>>
>> The page request doesn't provide this information
>>
> this is vt-d specific. it is in the vt-d page request descriptor and
> response descriptors are different depending on the type.
> Since we intend the generic data to be super set of models, I add this
> field.

But don't you need to add the stream type to enum iommu_fault_type, in
patch 8? Otherwise the guest can't know what type to set in the response.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 2/2] drivers: dma-mapping: parse per device reserved mem at probe time

2017-12-07 Thread Marek Szyprowski

Hi Peng,

On 2017-11-27 10:04, Peng Fan wrote:

On Mon, Nov 27, 2017 at 09:44:20AM +0100, Marek Szyprowski wrote:

On 2017-11-27 09:37, Peng Fan wrote:

On Mon, Nov 27, 2017 at 09:31:00AM +0100, Marek Szyprowski wrote:

On 2017-11-26 14:13, Peng Fan wrote:

Invoke of_reserved_mem_device_init at dma_configure, then
there is no need to call of_reserved_mem_device_init in device
specific probe function.

Signed-off-by: Peng Fan 

This has been already tried long time ago, without success:
http://patches.linaro.org/patch/33558/

Thanks for the info. I should first search mail list before
sending out patches.

It doesn't mean that I'm against such idea. I just pointed that I've
already tried. That time, however there was no dma_configure() function
yet, which seems to be better place for of_rmem_device_init().

I would however always call of_dma_configure(), even when reserved mem
node is there. IIRC on ARM64 that function configures dma_ops, without
which no dma is possible at all.

So, you prefer this?
if (dma_dev->of_node) {
+   of_reserved_mem_device_init(dev);
ret = of_dma_configure(dev, dma_dev->of_node);

However in of_reserved_mem_device, there is also an call to
of_dma_configure.

"
/* ensure that dma_ops is set for virtual devices
 * using reserved memory
 */
 ret = of_dma_configure(dev, np);
"

If always call of_dma_configure, of_dma_configure maybe called twice.


Right, I forgot about this.


I just checked more. of_reserved_mem_device_init only handle the first
memory-region. To nodes which have multiple memory-region, seems 2nd and etc
could not be handled, such as drivers/media/platform/s5p-mfc/s5p_mfc.c.


Well, maybe automatic assignment should be done only when there is only
one reserved region set? In case more than one region assigned to a device,
the driver has to create virtual child devices and configure DMA for them
to be able to let dma-mapping API to use those reserved regions. It looks
that the call to of_dma_configure(dev, np) can be moved back to the driver
(it was not possible at the time that code was merged due to missing
export symbols).

Configuring more than one reserved memory region for given device might
then moved to some helper function to have a common code for that across
the drivers.

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu