RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: David Gibson > Sent: Wednesday, September 29, 2021 12:56 PM > > > > > Unlike vfio, iommufd adopts a device-centric design with all group > > logistics hidden behind the fd. Binding a device to iommufd serves > > as the contract to get security context established (and vice versa > > for unbinding). One additional requirement in iommufd is to manage the > > switch between multiple security contexts due to decoupled bind/attach: > > > > 1) Open a device in "/dev/vfio/devices" with user access blocked; > > Probably worth clarifying that (1) must happen for *all* devices in > the group before (2) happens for any device in the group. No. User access is naturally blocked for other devices as long as they are not opened yet. > > > 2) Bind the device to an iommufd with an initial security context > > (an empty iommu domain which blocks dma) established for its > > group, with user access unblocked; > > > > 3) Attach the device to a user-specified ioasid (shared by all devices > > attached to this ioasid). Before attaching, the device should be first > > detached from the initial context; > > So, this step can implicitly but observably change the behaviour for > other devices in the group as well. I don't love that kind of > difficult to predict side effect, which is why I'm *still* not totally > convinced by the device-centric model. which side-effect is predicted here? The user anyway needs to be aware of such group restriction regardless whether it uses group or nongroup interface. > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 5ea3a007fd7c..bffd84e978fb 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -45,6 +45,8 @@ struct iommu_group { > > struct iommu_domain *default_domain; > > struct iommu_domain *domain; > > struct list_head entry; > > + unsigned long user_dma_owner_id; > > Using an opaque integer doesn't seem like a good idea. I think you > probably want a pointer to a suitable struct dma_owner or the like > (you could have one embedded in each iommufd struct, plus a global > static kernel_default_owner). > For remaining comments you may want to look at the latest discussion here: https://lore.kernel.org/kvm/20210928140712.gl964...@nvidia.com/ It relies on driver core change to manage group ownership gracefully. No BUG_ON() is triggered any more for driver binding. There a fd will be passed in to mark the ownership. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: David Gibson > Sent: Wednesday, September 29, 2021 10:44 AM > > > One alternative option is to arrange device nodes in sub-directories based > > on the device type. But doing so also adds one trouble to userspace. The > > current vfio uAPI is designed to have the user query device type via > > VFIO_DEVICE_GET_INFO after opening the device. With this option the user > > instead needs to figure out the device type before opening the device, to > > identify the sub-directory. > > Wouldn't this be up to the operator / configuration, rather than the > actual software though? I would assume that typically the VFIO > program would be pointed at a specific vfio device node file to use, > e.g. > my-vfio-prog -d /dev/vfio/pci/:0a:03.1 > > Or more generally, if you're expecting userspace to know a name in a > uniqu pattern, they can equally well know a "type/name" pair. > You are correct. Currently: -device, vfio-pci,host=:BB:DD.F -device, vfio-pci,sysfdev=/sys/bus/pci/devices/ :BB:DD.F -device, vfio-platform,sysdev=/sys/bus/platform/devices/PNP0103:00 above is definitely type/name information to find the related node. Actually even for Jason's proposal we still need such information to identify the sysfs path. Then I feel type-based sub-directory does work. Adding another link to sysfs sounds unnecessary now. But I'm not sure whether we still want to create /dev/vfio/devices/vfio0 thing and related udev rule thing that you pointed out in another mail. Jason? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote: > From: Lu Baolu > > This extends iommu core to manage security context for passthrough > devices. Please bear a long explanation for how we reach this design > instead of managing it solely in iommufd like what vfio does today. > > Devices which cannot be isolated from each other are organized into an > iommu group. When a device is assigned to the user space, the entire > group must be put in a security context so that user-initiated DMAs via > the assigned device cannot harm the rest of the system. No user access > should be granted on a device before the security context is established > for the group which the device belongs to. > > Managing the security context must meet below criteria: > > 1) The group is viable for user-initiated DMAs. This implies that the > devices in the group must be either bound to a device-passthrough > framework, or driver-less, or bound to a driver which is known safe > (not do DMA). > > 2) The security context should only allow DMA to the user's memory and > devices in this group; > > 3) After the security context is established for the group, the group > viability must be continuously monitored before the user relinquishes > all devices belonging to the group. The viability might be broken e.g. > when a driver-less device is later bound to a driver which does DMA. > > 4) The security context should not be destroyed before user access > permission is withdrawn. > > Existing vfio introduces explicit container/group semantics in its uAPI > to meet above requirements. A single security context (iommu domain) > is created per container. Attaching group to container moves the entire > group into the associated security context, and vice versa. The user can > open the device only after group attach. A group can be detached only > after all devices in the group are closed. Group viability is monitored > by listening to iommu group events. > > Unlike vfio, iommufd adopts a device-centric design with all group > logistics hidden behind the fd. Binding a device to iommufd serves > as the contract to get security context established (and vice versa > for unbinding). One additional requirement in iommufd is to manage the > switch between multiple security contexts due to decoupled bind/attach: > > 1) Open a device in "/dev/vfio/devices" with user access blocked; Probably worth clarifying that (1) must happen for *all* devices in the group before (2) happens for any device in the group. > 2) Bind the device to an iommufd with an initial security context > (an empty iommu domain which blocks dma) established for its > group, with user access unblocked; > > 3) Attach the device to a user-specified ioasid (shared by all devices > attached to this ioasid). Before attaching, the device should be first > detached from the initial context; So, this step can implicitly but observably change the behaviour for other devices in the group as well. I don't love that kind of difficult to predict side effect, which is why I'm *still* not totally convinced by the device-centric model. > 4) Detach the device from the ioasid and switch it back to the initial > security context; Same non-local side effect at this step, of course. Btw, explicitly naming the "no DMA" context is probably a good idea, rather than referring to the "initial security context" (it's "initial" from the PoV of the iommufd, but not from the PoV of the device fd which was likely bound to the default kernel context before (2)). > 5) Unbind the device from the iommufd, back to access blocked state and > move its group out of the initial security context if it's the last > unbound device in the group; Maybe worth clarifying that again (5) must happen for all devices in the group before rebiding any devices to regular kernel drivers. > > (multiple attach/detach could happen between 2 and 5). > > However existing iommu core has problem with above transition. Detach > in step 3/4 makes the device/group re-attached to the default domain > automatically, which opens the door for user-initiated DMAs to attack > the rest of the system. The existing vfio doesn't have this problem as > it combines 2/3 in one step (so does 4/5). > > Fixing this problem requires the iommu core to also participate in the > security context management. Following this direction we also move group > viability check into the iommu core, which allows iommufd to stay fully > device-centric w/o keeping any group knowledge (combining with the > extension to iommu_at[de]tach_device() in a latter patch). > > Basically two new interfaces are provided: > > int iommu_device_init_user_dma(struct device *dev, > unsigned long owner); > void iommu_device_exit_user_dma(struct device *dev); > > iommufd calls them respectively when handling device binding/unbinding > requests. > > The
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: David Gibson > Sent: Wednesday, September 29, 2021 10:44 AM > > > > > One open about how to organize the device nodes under > /dev/vfio/devices/. > > This RFC adopts a simple policy by keeping a flat layout with mixed > devname > > from all kinds of devices. The prerequisite of this model is that devnames > > from different bus types are unique formats: > > > > /dev/vfio/devices/:00:14.2 (pci) > > /dev/vfio/devices/PNP0103:00 (platform) > > /dev/vfio/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 (mdev) > > Oof. I really don't think this is a good idea. Ensuring that a > format is "unique" in the sense that it can't collide with any of the > other formats, for *every* value of the parameters on both sides is > actually pretty complicated in general. > > I think per-type sub-directories would be helpful here, Jason's > suggestion of just sequential numbers would work as well. we'll follow Jason's suggestion in next version. > > + /* > > +* Refcounting can't start until the driver call register. Don’t > > +* start twice when the device is exposed in both group and > nongroup > > +* interfaces. > > +*/ > > + if (!refcount_read(>refcount)) > > Is there a possible race here with something getting in and > incrementing the refcount between the read and set? this will not be required in next version, which will always create both group and nongroup interfaces for every device (then let driver providing .bind_iommufd() callback for whether nongroup interface is functional). It will be centrally processed within existing vfio_[un]register_group_dev(), thus above race is not a concern any more. > > > + refcount_set(>refcount, 1); > > > > mutex_lock(>device_lock); > > list_add(>group_next, >device_list); > > @@ -804,7 +810,78 @@ int vfio_register_group_dev(struct vfio_device > *device) > > > > return 0; > > } > > -EXPORT_SYMBOL_GPL(vfio_register_group_dev); > > + > > +static int __vfio_register_nongroup_dev(struct vfio_device *device) > > +{ > > + struct vfio_device *existing_device; > > + struct device *dev; > > + int ret = 0, minor; > > + > > + mutex_lock(_lock); > > + list_for_each_entry(existing_device, _list, vfio_next) { > > + if (existing_device == device) { > > + ret = -EBUSY; > > + goto out_unlock; > > This indicates a bug in the caller, doesn't it? Should it be a BUG or > WARN instead? this call is initiated by userspace. Per Jason's suggestion we don't even need to check it then no lock is required. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 03/20] vfio: Add vfio_[un]register_device()
On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote: > With /dev/vfio/devices introduced, now a vfio device driver has three > options to expose its device to userspace: > > a) only legacy group interface, for devices which haven't been moved to > iommufd (e.g. platform devices, sw mdev, etc.); > > b) both legacy group interface and new device-centric interface, for > devices which supports iommufd but also wants to keep backward > compatibility (e.g. pci devices in this RFC); > > c) only new device-centric interface, for new devices which don't carry > backward compatibility burden (e.g. hw mdev/subdev with pasid); > > This patch introduces vfio_[un]register_device() helpers for the device > drivers to specify the device exposure policy to vfio core. Hence the > existing vfio_[un]register_group_dev() become the wrapper of the new > helper functions. The new device-centric interface is described as > 'nongroup' to differentiate from existing 'group' stuff. > > TBD: this patch needs to rebase on top of below series from Christoph in > next version. > > "cleanup vfio iommu_group creation" > > Legacy userspace continues to follow the legacy group interface. > > Newer userspace can first try the new device-centric interface if the > device is present under /dev/vfio/devices. Otherwise fall back to the > group interface. > > One open about how to organize the device nodes under /dev/vfio/devices/. > This RFC adopts a simple policy by keeping a flat layout with mixed devname > from all kinds of devices. The prerequisite of this model is that devnames > from different bus types are unique formats: > > /dev/vfio/devices/:00:14.2 (pci) > /dev/vfio/devices/PNP0103:00 (platform) > /dev/vfio/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 (mdev) Oof. I really don't think this is a good idea. Ensuring that a format is "unique" in the sense that it can't collide with any of the other formats, for *every* value of the parameters on both sides is actually pretty complicated in general. I think per-type sub-directories would be helpful here, Jason's suggestion of just sequential numbers would work as well. > One alternative option is to arrange device nodes in sub-directories based > on the device type. But doing so also adds one trouble to userspace. The > current vfio uAPI is designed to have the user query device type via > VFIO_DEVICE_GET_INFO after opening the device. With this option the user > instead needs to figure out the device type before opening the device, to > identify the sub-directory. Wouldn't this be up to the operator / configuration, rather than the actual software though? I would assume that typically the VFIO program would be pointed at a specific vfio device node file to use, e.g. my-vfio-prog -d /dev/vfio/pci/:0a:03.1 Or more generally, if you're expecting userspace to know a name in a uniqu pattern, they can equally well know a "type/name" pair. > Another tricky thing is that "pdev. vs. mdev" > and "pci vs. platform vs. ccw,..." are orthogonal categorizations. Need > more thoughts on whether both or just one category should be used to define > the sub-directories. > > Signed-off-by: Liu Yi L > --- > drivers/vfio/vfio.c | 137 +++ > include/linux/vfio.h | 9 +++ > 2 files changed, 134 insertions(+), 12 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index 84436d7abedd..1e87b25962f1 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -51,6 +51,7 @@ static struct vfio { > struct cdev device_cdev; > dev_t device_devt; > struct mutexdevice_lock; > + struct list_headdevice_list; > struct idr device_idr; > } vfio; > > @@ -757,7 +758,7 @@ void vfio_init_group_dev(struct vfio_device *device, > struct device *dev, > } > EXPORT_SYMBOL_GPL(vfio_init_group_dev); > > -int vfio_register_group_dev(struct vfio_device *device) > +static int __vfio_register_group_dev(struct vfio_device *device) > { > struct vfio_device *existing_device; > struct iommu_group *iommu_group; > @@ -794,8 +795,13 @@ int vfio_register_group_dev(struct vfio_device *device) > /* Our reference on group is moved to the device */ > device->group = group; > > - /* Refcounting can't start until the driver calls register */ > - refcount_set(>refcount, 1); > + /* > + * Refcounting can't start until the driver call register. Don’t > + * start twice when the device is exposed in both group and nongroup > + * interfaces. > + */ > + if (!refcount_read(>refcount)) Is there a possible race here with something getting in and incrementing the refcount between the read and set? > + refcount_set(>refcount, 1); > > mutex_lock(>device_lock); > list_add(>group_next,
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L [snip] > +static bool vfio_device_in_container(struct vfio_device *device) > +{ > + return !!(device->group && device->group->container); You don't need !! here. && is already a logical operation, so returns a valid bool. > +} > + > static int vfio_device_fops_release(struct inode *inode, struct file *filep) > { > struct vfio_device *device = filep->private_data; > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode > *inode, struct file *filep) > > module_put(device->dev->driver->owner); > > - vfio_group_try_dissolve_container(device->group); > + if (vfio_device_in_container(device)) { > + vfio_group_try_dissolve_container(device->group); > + } else { > + atomic_dec(>opened); > + if (device->group) { > + mutex_lock(>group->opened_lock); > + device->group->opened--; > + mutex_unlock(>group->opened_lock); > + } > + } > > vfio_device_put(device); > > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file *filep, > struct vm_area_struct *vma) > > static const struct file_operations vfio_device_fops = { > .owner = THIS_MODULE, > + .open = vfio_device_fops_open, > .release= vfio_device_fops_release, > .read = vfio_device_fops_read, > .write = vfio_device_fops_write, > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > .mode = S_IRUGO | S_IWUGO, > }; > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > +{ > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); Others have pointed out some problems with the use of dev_name() here. I'll add that I think you'll make things much easier if instead of using one huge "devices" subdir, you use a separate subdir for each vfio sub-driver (so, one for PCI, one for each type of mdev, one for platform, etc.). That should make avoiding name conflicts a lot simpler. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 03/20] vfio: Add vfio_[un]register_device()
On Tue, Sep 21, 2021 at 10:00:14PM -0300, Jason Gunthorpe wrote: > On Wed, Sep 22, 2021 at 12:54:02AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 22, 2021 12:01 AM > > > > > > > One open about how to organize the device nodes under > > > /dev/vfio/devices/. > > > > This RFC adopts a simple policy by keeping a flat layout with mixed > > > devname > > > > from all kinds of devices. The prerequisite of this model is that > > > > devnames > > > > from different bus types are unique formats: > > > > > > This isn't reliable, the devname should just be vfio0, vfio1, etc > > > > > > The userspace can learn the correct major/minor by inspecting the > > > sysfs. > > > > > > This whole concept should disappear into the prior patch that adds the > > > struct device in the first place, and I think most of the code here > > > can be deleted once the struct device is used properly. > > > > > > > Can you help elaborate above flow? This is one area where we need > > more guidance. > > > > When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F", > > how does Qemu identify which vifo0/1/... is associated with the specified > > :BB:DD.F? > > When done properly in the kernel the file: > > /sys/bus/pci/devices/:BB:DD.F/vfio/vfioX/dev > > Will contain the major:minor of the VFIO device. > > Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat > that the major:minor matches. > > in the above pattern "pci" and ":BB:DD.FF" are the arguments passed > to qemu. I thought part of the appeal of the device centric model was less grovelling around in sysfs for information. Using type/address directly in /dev seems simpler than having to dig around matching things here. Note that this doesn't have to be done in kernel: you could have the kernel just call them /dev/vfio/devices/vfio0, ... but add udev rules that create symlinks from say /dev/vfio/pci/:BB:SS.F - > ../devices/vfioXX based on the sysfs information. > > You can look at this for some general over engineered code to handle > opening from a sysfs handle like above: > > https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c > > Jason > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 04/20] iommu: Add iommu_device_get_info interface
On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote: > From: Lu Baolu > > This provides an interface for upper layers to get the per-device iommu > attributes. > > int iommu_device_get_info(struct device *dev, > enum iommu_devattr attr, void *data); That fact that this interface doesn't let you know how to size the data buffer, other than by just knowing the right size for each attr concerns me. > > The first attribute (IOMMU_DEV_INFO_FORCE_SNOOP) is added. It tells if > the iommu can force DMA to snoop cache. At this stage, only PCI devices > which have this attribute set could use the iommufd, this is due to > supporting no-snoop DMA requires additional refactoring work on the > current kvm-vfio contract. The following patch will have vfio check this > attribute to decide whether a pci device can be exposed through > /dev/vfio/devices. > > Signed-off-by: Lu Baolu > --- > drivers/iommu/iommu.c | 16 > include/linux/iommu.h | 19 +++ > 2 files changed, 35 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 63f0af10c403..5ea3a007fd7c 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -3260,3 +3260,19 @@ static ssize_t iommu_group_store_type(struct > iommu_group *group, > > return ret; > } > + > +/* Expose per-device iommu attributes. */ > +int iommu_device_get_info(struct device *dev, enum iommu_devattr attr, void > *data) > +{ > + const struct iommu_ops *ops; > + > + if (!dev->bus || !dev->bus->iommu_ops) > + return -EINVAL; > + > + ops = dev->bus->iommu_ops; > + if (unlikely(!ops->device_info)) > + return -ENODEV; > + > + return ops->device_info(dev, attr, data); > +} > +EXPORT_SYMBOL_GPL(iommu_device_get_info); > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 32d448050bf7..52a6d33c82dc 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -150,6 +150,14 @@ enum iommu_dev_features { > IOMMU_DEV_FEAT_IOPF, > }; > > +/** > + * enum iommu_devattr - Per device IOMMU attributes > + * @IOMMU_DEV_INFO_FORCE_SNOOP [bool]: IOMMU can force DMA to be snooped. > + */ > +enum iommu_devattr { > + IOMMU_DEV_INFO_FORCE_SNOOP, > +}; > + > #define IOMMU_PASID_INVALID (-1U) > > #ifdef CONFIG_IOMMU_API > @@ -215,6 +223,7 @@ struct iommu_iotlb_gather { > * - IOMMU_DOMAIN_IDENTITY: must use an identity domain > * - IOMMU_DOMAIN_DMA: must use a dma domain > * - 0: use the default setting > + * @device_info: query per-device iommu attributes > * @pgsize_bitmap: bitmap of all possible supported page sizes > * @owner: Driver module providing these ops > */ > @@ -283,6 +292,8 @@ struct iommu_ops { > > int (*def_domain_type)(struct device *dev); > > + int (*device_info)(struct device *dev, enum iommu_devattr attr, void > *data); > + > unsigned long pgsize_bitmap; > struct module *owner; > }; > @@ -604,6 +615,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device > *dev, > void iommu_sva_unbind_device(struct iommu_sva *handle); > u32 iommu_sva_get_pasid(struct iommu_sva *handle); > > +int iommu_device_get_info(struct device *dev, enum iommu_devattr attr, void > *data); > + > #else /* CONFIG_IOMMU_API */ > > struct iommu_ops {}; > @@ -999,6 +1012,12 @@ static inline struct iommu_fwspec > *dev_iommu_fwspec_get(struct device *dev) > { > return NULL; > } > + > +static inline int iommu_device_get_info(struct device *dev, > + enum iommu_devattr type, void *data) > +{ > + return -ENODEV; > +} > #endif /* CONFIG_IOMMU_API */ > > /** -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On 9/29/21 10:29 AM, Tian, Kevin wrote: From: Lu Baolu Sent: Wednesday, September 29, 2021 10:22 AM On 9/28/21 10:07 PM, Jason Gunthorpe wrote: On Tue, Sep 28, 2021 at 09:35:05PM +0800, Lu Baolu wrote: Another issue is, when putting a device into user-dma mode, all devices belonging to the same iommu group shouldn't be bound with a kernel- dma driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is not lock safe as discussed below, https://lore.kernel.org/linux- iommu/20210927130935.gz964...@nvidia.com/ Any guidance on this? Something like this? int iommu_set_device_dma_owner(struct device *dev, enum device_dma_owner mode, struct file *user_owner) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); switch (mode) { case DMA_OWNER_KERNEL: if (iommu_group- dma_users[DMA_OWNER_USERSPACE]) return -EBUSY; break; case DMA_OWNER_SHARED: break; case DMA_OWNER_USERSPACE: if (iommu_group- dma_users[DMA_OWNER_KERNEL]) return -EBUSY; if (iommu_group->dma_owner_file != user_owner) { if (iommu_group- dma_users[DMA_OWNER_USERSPACE]) return -EPERM; get_file(user_owner); iommu_group->dma_owner_file = user_owner; } break; default: spin_unlock(_group->dma_owner_lock); return -EINVAL; } iommu_group->dma_users[mode]++; spin_unlock(_group->dma_owner_lock); return 0; } int iommu_release_device_dma_owner(struct device *dev, enum device_dma_owner mode) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); if (WARN_ON(!iommu_group->dma_users[mode])) goto err_unlock; if (!iommu_group->dma_users[mode]--) { if (mode == DMA_OWNER_USERSPACE) { fput(iommu_group->dma_owner_file); iommu_group->dma_owner_file = NULL; } } err_unlock: spin_unlock(_group->dma_owner_lock); } Where, the driver core does before probe: iommu_set_device_dma_owner(dev, DMA_OWNER_KERNEL, NULL) pci_stub/etc does in their probe func: iommu_set_device_dma_owner(dev, DMA_OWNER_SHARED, NULL) And vfio/iommfd does when a struct vfio_device FD is attached: iommu_set_device_dma_owner(dev, DMA_OWNER_USERSPACE, group_file/iommu_file) Really good design. It also helps alleviating some pains elsewhere in the iommu core. Just a nit comment, we also need DMA_OWNER_NONE which will be set when the driver core unbinds the driver from the device. Not necessarily. NONE is represented by none of dma_user[mode] is valid. Fair enough. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 7/7] dma-iommu: account for min_align_mask w/swiotlb
From: David Stevens Pass the non-aligned size to __iommu_dma_map when using swiotlb bounce buffers in iommu_dma_map_page, to account for min_align_mask. To deal with granule alignment, __iommu_dma_map maps iova_align(size + iova_off) bytes starting at phys - iova_off. If iommu_dma_map_page passes aligned size when using swiotlb, then this becomes iova_align(iova_align(orig_size) + iova_off). Normally iova_off will be zero when using swiotlb. However, this is not the case for devices that set min_align_mask. When iova_off is non-zero, __iommu_dma_map ends up mapping an extra page at the end of the buffer. Beyond just being a security issue, the extra page is not cleaned up by __iommu_dma_unmap. This causes problems when the IOVA is reused, due to collisions in the iommu driver. Just passing the original size is sufficient, since __iommu_dma_map will take care of granule alignment. Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask") Signed-off-by: David Stevens --- drivers/iommu/dma-iommu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 289c49ead01a..342359727a59 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -806,7 +806,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; - size_t aligned_size = size; dma_addr_t iova, dma_mask = dma_get_mask(dev); /* @@ -815,7 +814,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, */ if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) { void *padding_start; - size_t padding_size; + size_t padding_size, aligned_size; aligned_size = iova_align(iovad, size); phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size, @@ -840,7 +839,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) arch_sync_dma_for_device(phys, size, dir); - iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); + iova = __iommu_dma_map(dev, phys, size, prot, dma_mask); if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys)) swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); return iova; -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 6/7] swiotlb: support aligned swiotlb buffers
From: David Stevens Add an argument to swiotlb_tbl_map_single that specifies the desired alignment of the allocated buffer. This is used by dma-iommu to ensure the buffer is aligned to the iova granule size when using swiotlb with untrusted sub-granule mappings. This addresses an issue where adjacent slots could be exposed to the untrusted device if IO_TLB_SIZE < iova granule < PAGE_SIZE. Signed-off-by: David Stevens Reviewed-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 4 ++-- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 3 ++- kernel/dma/swiotlb.c | 13 - 4 files changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 85a005b268f6..289c49ead01a 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -818,8 +818,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, size_t padding_size; aligned_size = iova_align(iovad, size); - phys = swiotlb_tbl_map_single(dev, phys, size, - aligned_size, dir, attrs); + phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size, + iova_mask(iovad), dir, attrs); if (phys == DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index e56a5faac395..cbdff8979980 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -380,7 +380,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, */ trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); - map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs); + map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir, attrs); if (map == (phys_addr_t)DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index b0cb2a9973f4..569272871375 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -45,7 +45,8 @@ extern void __init swiotlb_update_mem_attributes(void); phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, size_t mapping_size, size_t alloc_size, - enum dma_data_direction dir, unsigned long attrs); + unsigned int alloc_aligned_mask, enum dma_data_direction dir, + unsigned long attrs); extern void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 87c40517e822..019672b3da1d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -459,7 +459,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * allocate a buffer from that IO TLB pool. */ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size) + size_t alloc_size, unsigned int alloc_align_mask) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); @@ -483,6 +483,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1; if (alloc_size >= PAGE_SIZE) stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT)); + stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1); spin_lock_irqsave(>lock, flags); if (unlikely(nslots > mem->nslabs - mem->used)) @@ -541,7 +542,8 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, size_t mapping_size, size_t alloc_size, - enum dma_data_direction dir, unsigned long attrs) + unsigned int alloc_align_mask, enum dma_data_direction dir, + unsigned long attrs) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned int offset = swiotlb_align_offset(dev, orig_addr); @@ -561,7 +563,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return (phys_addr_t)DMA_MAPPING_ERROR; } - index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset); + index = swiotlb_find_slots(dev, orig_addr, + alloc_size + offset, alloc_align_mask); if (index == -1) { if (!(attrs & DMA_ATTR_NO_WARN)) dev_warn_ratelimited(dev, @@ -675,7 +678,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size, trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size, swiotlb_force); -
[PATCH v8 5/7] dma-iommu: Check CONFIG_SWIOTLB more broadly
From: David Stevens Introduce a new dev_use_swiotlb function to guard swiotlb code, instead of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be checked more broadly, so the swiotlb related code can be removed more aggressively. Signed-off-by: David Stevens Reviewed-by: Robin Murphy Reviewed-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 4f77c44eaf14..85a005b268f6 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -317,6 +317,11 @@ static bool dev_is_untrusted(struct device *dev) return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; } +static bool dev_use_swiotlb(struct device *dev) +{ + return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev); +} + /* sysfs updates are serialised by the mutex of the group owning @domain */ int iommu_dma_init_fq(struct iommu_domain *domain) { @@ -731,7 +736,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev, { phys_addr_t phys; - if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) + if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev)) return; phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); @@ -747,7 +752,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev, { phys_addr_t phys; - if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) + if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev)) return; phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); @@ -765,7 +770,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg; int i; - if (dev_is_untrusted(dev)) + if (dev_use_swiotlb(dev)) for_each_sg(sgl, sg, nelems, i) iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg), sg->length, dir); @@ -781,7 +786,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg; int i; - if (dev_is_untrusted(dev)) + if (dev_use_swiotlb(dev)) for_each_sg(sgl, sg, nelems, i) iommu_dma_sync_single_for_device(dev, sg_dma_address(sg), @@ -808,8 +813,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, * If both the physical buffer start address and size are * page aligned, we don't need to use a bounce page. */ - if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) && - iova_offset(iovad, phys | size)) { + if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | size)) { void *padding_start; size_t padding_size; @@ -994,7 +998,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, goto out; } - if (dev_is_untrusted(dev)) + if (dev_use_swiotlb(dev)) return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs); if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) @@ -1072,7 +1076,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *tmp; int i; - if (dev_is_untrusted(dev)) { + if (dev_use_swiotlb(dev)) { iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs); return; } -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 4/7] dma-iommu: fold _swiotlb helpers into callers
From: David Stevens Fold the _swiotlb helper functions into the respective _page functions, since recent fixes have moved all logic from the _page functions to the _swiotlb functions. Signed-off-by: David Stevens Reviewed-by: Christoph Hellwig Reviewed-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 135 +- 1 file changed, 59 insertions(+), 76 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 027b489714b7..4f77c44eaf14 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -510,26 +510,6 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, iommu_dma_free_iova(cookie, dma_addr, size, _gather); } -static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ - struct iommu_domain *domain = iommu_get_dma_domain(dev); - phys_addr_t phys; - - phys = iommu_iova_to_phys(domain, dma_addr); - if (WARN_ON(!phys)) - return; - - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev)) - arch_sync_dma_for_cpu(phys, size, dir); - - __iommu_dma_unmap(dev, dma_addr, size); - - if (unlikely(is_swiotlb_buffer(dev, phys))) - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); -} - static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size_t size, int prot, u64 dma_mask) { @@ -556,55 +536,6 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, return iova + iova_off; } -static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, - size_t org_size, dma_addr_t dma_mask, bool coherent, - enum dma_data_direction dir, unsigned long attrs) -{ - int prot = dma_info_to_prot(dir, coherent, attrs); - struct iommu_domain *domain = iommu_get_dma_domain(dev); - struct iommu_dma_cookie *cookie = domain->iova_cookie; - struct iova_domain *iovad = >iovad; - size_t aligned_size = org_size; - void *padding_start; - size_t padding_size; - dma_addr_t iova; - - /* -* If both the physical buffer start address and size are -* page aligned, we don't need to use a bounce page. -*/ - if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) && - iova_offset(iovad, phys | org_size)) { - aligned_size = iova_align(iovad, org_size); - phys = swiotlb_tbl_map_single(dev, phys, org_size, - aligned_size, dir, attrs); - - if (phys == DMA_MAPPING_ERROR) - return DMA_MAPPING_ERROR; - - /* Cleanup the padding area. */ - padding_start = phys_to_virt(phys); - padding_size = aligned_size; - - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_TO_DEVICE || -dir == DMA_BIDIRECTIONAL)) { - padding_start += org_size; - padding_size -= org_size; - } - - memset(padding_start, 0, padding_size); - } - - if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_device(phys, org_size, dir); - - iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys)) - swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs); - return iova; -} - static void __iommu_dma_free_pages(struct page **pages, int count) { while (count--) @@ -866,15 +797,68 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, { phys_addr_t phys = page_to_phys(page) + offset; bool coherent = dev_is_dma_coherent(dev); + int prot = dma_info_to_prot(dir, coherent, attrs); + struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iova_domain *iovad = >iovad; + size_t aligned_size = size; + dma_addr_t iova, dma_mask = dma_get_mask(dev); + + /* +* If both the physical buffer start address and size are +* page aligned, we don't need to use a bounce page. +*/ + if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) && + iova_offset(iovad, phys | size)) { + void *padding_start; + size_t padding_size; + + aligned_size = iova_align(iovad, size); + phys = swiotlb_tbl_map_single(dev, phys, size, + aligned_size, dir, attrs); + + if (phys == DMA_MAPPING_ERROR) + return DMA_MAPPING_ERROR; - return __iommu_dma_map_swiotlb(dev, phys,
[PATCH v8 3/7] dma-iommu: skip extra sync during unmap w/swiotlb
From: David Stevens Calling the iommu_dma_sync_*_for_cpu functions during unmap can cause two copies out of the swiotlb buffer. Do the arch sync directly in __iommu_dma_unmap_swiotlb instead to avoid this. This makes the call to iommu_dma_sync_sg_for_cpu for untrusted devices in iommu_dma_unmap_sg no longer necessary, so move that invocation later in the function. Signed-off-by: David Stevens Reviewed-by: Christoph Hellwig Reviewed-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 19bebacbf178..027b489714b7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -521,6 +521,9 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr, if (WARN_ON(!phys)) return; + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && !dev_is_dma_coherent(dev)) + arch_sync_dma_for_cpu(phys, size, dir); + __iommu_dma_unmap(dev, dma_addr, size); if (unlikely(is_swiotlb_buffer(dev, phys))) @@ -871,8 +874,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir); __iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs); } @@ -1088,14 +1089,14 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *tmp; int i; - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir); - if (dev_is_untrusted(dev)) { iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs); return; } + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir); + /* * The scatterlist segments are mapped into a single * contiguous IOVA allocation, so this is incredibly easy. -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 2/7] dma-iommu: fix arch_sync_dma for map
From: David Stevens When calling arch_sync_dma, we need to pass it the memory that's actually being used for dma. When using swiotlb bounce buffers, this is the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb helper, so it can use the bounce buffer address if necessary. Now that iommu_dma_map_sg delegates to a function which takes care of architectural syncing in the untrusted device case, the call to iommu_dma_sync_sg_for_device can be moved so it only occurs for trusted devices. Doing the sync for untrusted devices before mapping never really worked, since it needs to be able to target swiotlb buffers. This also moves the architectural sync to before the call to __iommu_dma_map, to guarantee that untrusted devices can't see stale data they shouldn't see. Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Signed-off-by: David Stevens Reviewed-by: Christoph Hellwig Reviewed-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index c4d205b63c58..19bebacbf178 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -593,6 +593,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, memset(padding_start, 0, padding_size); } + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arch_sync_dma_for_device(phys, org_size, dir); + iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys)) swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs); @@ -860,14 +863,9 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, { phys_addr_t phys = page_to_phys(page) + offset; bool coherent = dev_is_dma_coherent(dev); - dma_addr_t dma_handle; - dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev), + return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev), coherent, dir, attrs); - if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - dma_handle != DMA_MAPPING_ERROR) - arch_sync_dma_for_device(phys, size, dir); - return dma_handle; } static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, @@ -1012,12 +1010,12 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, goto out; } - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - iommu_dma_sync_sg_for_device(dev, sg, nents, dir); - if (dev_is_untrusted(dev)) return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + iommu_dma_sync_sg_for_device(dev, sg, nents, dir); + /* * Work out how much IOVA space we need, and align the segments to * IOVA granules for the IOMMU driver to handle. With some clever -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 1/7] dma-iommu: fix sync_sg with swiotlb
From: David Stevens The is_swiotlb_buffer function takes the physical address of the swiotlb buffer, not the physical address of the original buffer. The sglist contains the physical addresses of the original buffer, so for the sync_sg functions to work properly when a bounce buffer might have been used, we need to use iommu_iova_to_phys to look up the physical address. This is what sync_single does, so call that function on each sglist segment. The previous code mostly worked because swiotlb does the transfer on map and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with sglists or which call sync_sg would not have had anything copied to the bounce buffer. Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Signed-off-by: David Stevens Reviewed-by: Robin Murphy Reviewed-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 33 + 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 896bea04c347..c4d205b63c58 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -828,17 +828,13 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg; int i; - if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) - return; - - for_each_sg(sgl, sg, nelems, i) { - if (!dev_is_dma_coherent(dev)) + if (dev_is_untrusted(dev)) + for_each_sg(sgl, sg, nelems, i) + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg), + sg->length, dir); + else if (!dev_is_dma_coherent(dev)) + for_each_sg(sgl, sg, nelems, i) arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir); - - if (is_swiotlb_buffer(dev, sg_phys(sg))) - swiotlb_sync_single_for_cpu(dev, sg_phys(sg), - sg->length, dir); - } } static void iommu_dma_sync_sg_for_device(struct device *dev, @@ -848,17 +844,14 @@ static void iommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg; int i; - if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) - return; - - for_each_sg(sgl, sg, nelems, i) { - if (is_swiotlb_buffer(dev, sg_phys(sg))) - swiotlb_sync_single_for_device(dev, sg_phys(sg), - sg->length, dir); - - if (!dev_is_dma_coherent(dev)) + if (dev_is_untrusted(dev)) + for_each_sg(sgl, sg, nelems, i) + iommu_dma_sync_single_for_device(dev, +sg_dma_address(sg), +sg->length, dir); + else if (!dev_is_dma_coherent(dev)) + for_each_sg(sgl, sg, nelems, i) arch_sync_dma_for_device(sg_phys(sg), sg->length, dir); - } } static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 0/7] Fixes for dma-iommu swiotlb bounce buffers
From: David Stevens This patch set includes various fixes for dma-iommu's swiotlb bounce buffers for untrusted devices. The min_align_mask issue was found when running fio on an untrusted nvme device with bs=512. The other issues were found via code inspection, so I don't have any specific use cases where things were not working, nor any concrete performance numbers. There are two issues related to min_align_mask that this patch series does not attempt to fix. First, it does not address the case where min_align_mask is larger than the IOVA granule. Doing so requires changes to IOVA allocation, and is not specific to when swiotlb bounce buffers are used. This is not a problem in practice today, since the only driver which uses min_align_mask is nvme, which sets it to 4096. The second issue this series does not address is the fact that extra swiotlb slots adjacent to a bounce buffer can be exposed to untrusted devices whose drivers use min_align_mask. Fixing this requires being able to allocate padding slots at the beginning of a swiotlb allocation. This is a rather significant change that I am not comfortable making. Without being able to handle this, there is also little point to clearing the padding at the start of such a buffer, since we can only clear based on (IO_TLB_SIZE - 1) instead of iova_mask. v7 -> v8: - Rebase on v5.15-rc3 and resolve conflicts with restricted dma v6 -> v7: - Remove unsafe attempt to clear padding at start of swiotlb buffer - Rewrite commit message for min_align_mask commit to better explain the problem it's fixing - Rebase on iommu/core - Acknowledge unsolved issues in cover letter v5 -> v6: - Remove unnecessary line break - Remove redundant config check v4 -> v5: - Fix xen build error - Move _swiotlb refactor into its own patch v3 -> v4: - Fold _swiotlb functions into _page functions - Add patch to align swiotlb buffer to iovad granule - Combine if checks in iommu_dma_sync_sg_* functions v2 -> v3: - Add new patch to address min_align_mask bug - Set SKIP_CPU_SYNC flag after syncing in map/unmap - Properly call arch_sync_dma_for_cpu in iommu_dma_sync_sg_for_cpu v1 -> v2: - Split fixes into dedicated patches - Less invasive changes to fix arch_sync when mapping - Leave dev_is_untrusted check for strict iommu David Stevens (7): dma-iommu: fix sync_sg with swiotlb dma-iommu: fix arch_sync_dma for map dma-iommu: skip extra sync during unmap w/swiotlb dma-iommu: fold _swiotlb helpers into callers dma-iommu: Check CONFIG_SWIOTLB more broadly swiotlb: support aligned swiotlb buffers dma-iommu: account for min_align_mask w/swiotlb drivers/iommu/dma-iommu.c | 188 +- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 3 +- kernel/dma/swiotlb.c | 13 ++- 4 files changed, 94 insertions(+), 112 deletions(-) -- 2.33.0.685.g46640cef36-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Lu Baolu > Sent: Wednesday, September 29, 2021 10:22 AM > > On 9/28/21 10:07 PM, Jason Gunthorpe wrote: > > On Tue, Sep 28, 2021 at 09:35:05PM +0800, Lu Baolu wrote: > >> Another issue is, when putting a device into user-dma mode, all devices > >> belonging to the same iommu group shouldn't be bound with a kernel- > dma > >> driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is > >> not lock safe as discussed below, > >> > >> https://lore.kernel.org/linux- > iommu/20210927130935.gz964...@nvidia.com/ > >> > >> Any guidance on this? > > > > Something like this? > > > > > > int iommu_set_device_dma_owner(struct device *dev, enum > device_dma_owner mode, > >struct file *user_owner) > > { > > struct iommu_group *group = group_from_dev(dev); > > > > spin_lock(_group->dma_owner_lock); > > switch (mode) { > > case DMA_OWNER_KERNEL: > > if (iommu_group- > >dma_users[DMA_OWNER_USERSPACE]) > > return -EBUSY; > > break; > > case DMA_OWNER_SHARED: > > break; > > case DMA_OWNER_USERSPACE: > > if (iommu_group- > >dma_users[DMA_OWNER_KERNEL]) > > return -EBUSY; > > if (iommu_group->dma_owner_file != user_owner) { > > if (iommu_group- > >dma_users[DMA_OWNER_USERSPACE]) > > return -EPERM; > > get_file(user_owner); > > iommu_group->dma_owner_file = > user_owner; > > } > > break; > > default: > > spin_unlock(_group->dma_owner_lock); > > return -EINVAL; > > } > > iommu_group->dma_users[mode]++; > > spin_unlock(_group->dma_owner_lock); > > return 0; > > } > > > > int iommu_release_device_dma_owner(struct device *dev, > >enum device_dma_owner mode) > > { > > struct iommu_group *group = group_from_dev(dev); > > > > spin_lock(_group->dma_owner_lock); > > if (WARN_ON(!iommu_group->dma_users[mode])) > > goto err_unlock; > > if (!iommu_group->dma_users[mode]--) { > > if (mode == DMA_OWNER_USERSPACE) { > > fput(iommu_group->dma_owner_file); > > iommu_group->dma_owner_file = NULL; > > } > > } > > err_unlock: > > spin_unlock(_group->dma_owner_lock); > > } > > > > > > Where, the driver core does before probe: > > > > iommu_set_device_dma_owner(dev, DMA_OWNER_KERNEL, NULL) > > > > pci_stub/etc does in their probe func: > > > > iommu_set_device_dma_owner(dev, DMA_OWNER_SHARED, NULL) > > > > And vfio/iommfd does when a struct vfio_device FD is attached: > > > > iommu_set_device_dma_owner(dev, DMA_OWNER_USERSPACE, > group_file/iommu_file) > > Really good design. It also helps alleviating some pains elsewhere in > the iommu core. > > Just a nit comment, we also need DMA_OWNER_NONE which will be set > when > the driver core unbinds the driver from the device. > Not necessarily. NONE is represented by none of dma_user[mode] is valid. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On 9/28/21 10:07 PM, Jason Gunthorpe wrote: On Tue, Sep 28, 2021 at 09:35:05PM +0800, Lu Baolu wrote: Another issue is, when putting a device into user-dma mode, all devices belonging to the same iommu group shouldn't be bound with a kernel-dma driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is not lock safe as discussed below, https://lore.kernel.org/linux-iommu/20210927130935.gz964...@nvidia.com/ Any guidance on this? Something like this? int iommu_set_device_dma_owner(struct device *dev, enum device_dma_owner mode, struct file *user_owner) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); switch (mode) { case DMA_OWNER_KERNEL: if (iommu_group->dma_users[DMA_OWNER_USERSPACE]) return -EBUSY; break; case DMA_OWNER_SHARED: break; case DMA_OWNER_USERSPACE: if (iommu_group->dma_users[DMA_OWNER_KERNEL]) return -EBUSY; if (iommu_group->dma_owner_file != user_owner) { if (iommu_group->dma_users[DMA_OWNER_USERSPACE]) return -EPERM; get_file(user_owner); iommu_group->dma_owner_file = user_owner; } break; default: spin_unlock(_group->dma_owner_lock); return -EINVAL; } iommu_group->dma_users[mode]++; spin_unlock(_group->dma_owner_lock); return 0; } int iommu_release_device_dma_owner(struct device *dev, enum device_dma_owner mode) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); if (WARN_ON(!iommu_group->dma_users[mode])) goto err_unlock; if (!iommu_group->dma_users[mode]--) { if (mode == DMA_OWNER_USERSPACE) { fput(iommu_group->dma_owner_file); iommu_group->dma_owner_file = NULL; } } err_unlock: spin_unlock(_group->dma_owner_lock); } Where, the driver core does before probe: iommu_set_device_dma_owner(dev, DMA_OWNER_KERNEL, NULL) pci_stub/etc does in their probe func: iommu_set_device_dma_owner(dev, DMA_OWNER_SHARED, NULL) And vfio/iommfd does when a struct vfio_device FD is attached: iommu_set_device_dma_owner(dev, DMA_OWNER_USERSPACE, group_file/iommu_file) Really good design. It also helps alleviating some pains elsewhere in the iommu core. Just a nit comment, we also need DMA_OWNER_NONE which will be set when the driver core unbinds the driver from the device. Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
>> if (!(xsave->header.xfeatures & fmask)) { >> xsave->header.xfeatures |= fmask; //< >> xsaves(xsave, fmask); >> } > > I'm not sure why the FPU state is initialized here. > > For updating the PASID state, it's unnecessary to init the PASID state. > > Maybe it is necessary in other cases? Dave had suggested initializing feature state when it is unknown (could be garbage). This is my attempt to follow that guidance. I'm not confident that my tests for "is the state in registers, in memory, or is garbage" really capture all the cases. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
> But the helpers seem to be generic. They take "task" as a parameter and > handle the task as non-current case. So the helpers are not for PASID only. > They may be used by others to modify a running task's FPU state. But > It's not safe to do so. > > At least need some comments/restriction for the helpers to be used on > a running task? Fenghua, Correct. When I add some comments I'll make it very clear that it is the responsibility of the caller to make sure that the task that is targeted cannot run. Earlier in this thread Dave suggested there are two cases where these helpers might be useful: 1) Fork/clone - to set up some xsave state in the child ... but this would be done before the child is allowed to run. 2) ptrace - this is a "maybe" because ptrace already has code to handle all the xsave state as a single entity. Perhaps someone might want to change it to only modify a single feature ... but this seems unlikely. In any case the ptrace code already "stops" the target process while it is reading/writing state. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
Hi, Tony, > void *begin_update_one_xsave_feature(struct task_struct *tsk, >enum xfeature xfeature, bool full) { > struct xregs_state *xsave = >thread.fpu.state.xsave; > struct xregs_state *xinit = _fpstate.xsave; > u64 fmask = 1ull << xfeature; > void *addr; > > BUG_ON(!(xsave->header.xcomp_bv & fmask)); > > fpregs_lock(); > > addr = __raw_xsave_addr(xsave, xfeature); > > if (full || tsk != current) { > memcpy(addr, __raw_xsave_addr(xinit, xfeature), > xstate_sizes[xfeature]); > goto out; > } > > if (!(xsave->header.xfeatures & fmask)) { > xsave->header.xfeatures |= fmask; //< > xsaves(xsave, fmask); > } I'm not sure why the FPU state is initialized here. For updating the PASID state, it's unnecessary to init the PASID state. Maybe it is necessary in other cases? > > out: > xsave->header.xfeatures |= fmask; Setting the xfeatures bit plus updating the PASID state is enough to restore the PASID state to the IA32_PASID MSR. > return addr; > } > > void finish_update_one_xsave_feature(struct task_struct *tsk) { > set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD); > if (tsk == current) //< > __cpu_invalidate_fpregs_state();//< > fpregs_unlock(); > } Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 9/9] iommu/vt-d: Use pci core's DVSEC functionality
Hi Dan, On 9/29/21 1:54 AM, Dan Williams wrote: On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: Reduce maintenance burden of DVSEC query implementation by using the centralized PCI core implementation. Cc: iommu@lists.linux-foundation.org Cc: David Woodhouse Cc: Lu Baolu Signed-off-by: Ben Widawsky --- drivers/iommu/intel/iommu.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d75f59ae28e6..30c97181f0ae 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5398,20 +5398,7 @@ static int intel_iommu_disable_sva(struct device *dev) */ static int siov_find_pci_dvsec(struct pci_dev *pdev) { - int pos; - u16 vendor, id; - - pos = pci_find_next_ext_capability(pdev, 0, 0x23); - while (pos) { - pci_read_config_word(pdev, pos + 4, ); - pci_read_config_word(pdev, pos + 8, ); - if (vendor == PCI_VENDOR_ID_INTEL && id == 5) - return pos; - - pos = pci_find_next_ext_capability(pdev, pos, 0x23); - } - - return 0; + return pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_INTEL, 5); } Same comments as the CXL patch, siov_find_pci_dvsec() doesn't seem to have a reason to exist anymore. What is 5? "5" is DVSEC ID for Scalable IOV. Anyway, the siov_find_pci_dvsec() has been dead code since commit 262948f8ba57 ("iommu: Delete iommu_dev_has_feature()"). I have a patch to clean it up. No need to care about it in this series. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 12/12] arm64: dts: mediatek: Get rid of mediatek, larb for MM nodes
After adding device_link between the IOMMU consumer and smi, the mediatek,larb is unnecessary now. CC: Matthias Brugger Signed-off-by: Yong Wu Reviewed-by: Evan Green --- arch/arm64/boot/dts/mediatek/mt8173.dtsi | 16 arch/arm64/boot/dts/mediatek/mt8183.dtsi | 6 -- 2 files changed, 22 deletions(-) diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi index d9e005ae5bb0..205c221696a6 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi @@ -1009,7 +1009,6 @@ < CLK_MM_MUTEX_32K>; power-domains = < MT8173_POWER_DOMAIN_MM>; iommus = < M4U_PORT_MDP_RDMA0>; - mediatek,larb = <>; mediatek,vpu = <>; }; @@ -1020,7 +1019,6 @@ < CLK_MM_MUTEX_32K>; power-domains = < MT8173_POWER_DOMAIN_MM>; iommus = < M4U_PORT_MDP_RDMA1>; - mediatek,larb = <>; }; mdp_rsz0: rsz@14003000 { @@ -1050,7 +1048,6 @@ clocks = < CLK_MM_MDP_WDMA>; power-domains = < MT8173_POWER_DOMAIN_MM>; iommus = < M4U_PORT_MDP_WDMA>; - mediatek,larb = <>; }; mdp_wrot0: wrot@14007000 { @@ -1059,7 +1056,6 @@ clocks = < CLK_MM_MDP_WROT0>; power-domains = < MT8173_POWER_DOMAIN_MM>; iommus = < M4U_PORT_MDP_WROT0>; - mediatek,larb = <>; }; mdp_wrot1: wrot@14008000 { @@ -1068,7 +1064,6 @@ clocks = < CLK_MM_MDP_WROT1>; power-domains = < MT8173_POWER_DOMAIN_MM>; iommus = < M4U_PORT_MDP_WROT1>; - mediatek,larb = <>; }; ovl0: ovl@1400c000 { @@ -1078,7 +1073,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_OVL0>; iommus = < M4U_PORT_DISP_OVL0>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1400 0xc000 0x1000>; }; @@ -1089,7 +1083,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_OVL1>; iommus = < M4U_PORT_DISP_OVL1>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1400 0xd000 0x1000>; }; @@ -1100,7 +1093,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA0>; iommus = < M4U_PORT_DISP_RDMA0>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1400 0xe000 0x1000>; }; @@ -,7 +1103,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA1>; iommus = < M4U_PORT_DISP_RDMA1>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1400 0xf000 0x1000>; }; @@ -1122,7 +1113,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA2>; iommus = < M4U_PORT_DISP_RDMA2>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1401 0 0x1000>; }; @@ -1133,7 +1123,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_WDMA0>; iommus = < M4U_PORT_DISP_WDMA0>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1401 0x1000 0x1000>; }; @@ -1144,7 +1133,6 @@ power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_WDMA1>; iommus = < M4U_PORT_DISP_WDMA1>; - mediatek,larb = <>; mediatek,gce-client-reg = < SUBSYS_1401 0x2000 0x1000>; }; @@ -1395,7 +1383,6 @@ <0 0x16027800 0 0x800>, /* VDEC_HWB */ <0 0x16028400 0 0x400>; /* VDEC_HWG */ interrupts = ; - mediatek,larb = <>; iommus = < M4U_PORT_HW_VDEC_MC_EXT>, < M4U_PORT_HW_VDEC_PP_EXT>, < M4U_PORT_HW_VDEC_AVC_MV_EXT>, @@ -1463,7 +1450,6 @@
[PATCH v8 11/12] arm: dts: mediatek: Get rid of mediatek, larb for MM nodes
After adding device_link between the IOMMU consumer and smi, the mediatek,larb is unnecessary now. CC: Matthias Brugger Signed-off-by: Yong Wu Reviewed-by: Evan Green Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- arch/arm/boot/dts/mt2701.dtsi | 2 -- arch/arm/boot/dts/mt7623n.dtsi | 5 - 2 files changed, 7 deletions(-) diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi index 4776f85d6d5b..ef583cfd3baf 100644 --- a/arch/arm/boot/dts/mt2701.dtsi +++ b/arch/arm/boot/dts/mt2701.dtsi @@ -564,7 +564,6 @@ clock-names = "jpgdec-smi", "jpgdec"; power-domains = < MT2701_POWER_DOMAIN_ISP>; - mediatek,larb = <>; iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>, < MT2701_M4U_PORT_JPGDEC_BSDMA>; }; @@ -577,7 +576,6 @@ clocks = < CLK_IMG_VENC>; clock-names = "jpgenc"; power-domains = < MT2701_POWER_DOMAIN_ISP>; - mediatek,larb = <>; iommus = < MT2701_M4U_PORT_JPGENC_RDMA>, < MT2701_M4U_PORT_JPGENC_BSDMA>; }; diff --git a/arch/arm/boot/dts/mt7623n.dtsi b/arch/arm/boot/dts/mt7623n.dtsi index bcb0846e29fd..3adab5cd1fef 100644 --- a/arch/arm/boot/dts/mt7623n.dtsi +++ b/arch/arm/boot/dts/mt7623n.dtsi @@ -121,7 +121,6 @@ clock-names = "jpgdec-smi", "jpgdec"; power-domains = < MT2701_POWER_DOMAIN_ISP>; - mediatek,larb = <>; iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>, < MT2701_M4U_PORT_JPGDEC_BSDMA>; }; @@ -144,7 +143,6 @@ interrupts = ; clocks = < CLK_MM_DISP_OVL>; iommus = < MT2701_M4U_PORT_DISP_OVL_0>; - mediatek,larb = <>; }; rdma0: rdma@14008000 { @@ -154,7 +152,6 @@ interrupts = ; clocks = < CLK_MM_DISP_RDMA>; iommus = < MT2701_M4U_PORT_DISP_RDMA>; - mediatek,larb = <>; }; wdma@14009000 { @@ -164,7 +161,6 @@ interrupts = ; clocks = < CLK_MM_DISP_WDMA>; iommus = < MT2701_M4U_PORT_DISP_WDMA>; - mediatek,larb = <>; }; bls: pwm@1400a000 { @@ -215,7 +211,6 @@ interrupts = ; clocks = < CLK_MM_DISP_RDMA1>; iommus = < MT2701_M4U_PORT_DISP_RDMA1>; - mediatek,larb = <>; }; dpi0: dpi@14014000 { -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 10/12] memory: mtk-smi: Get rid of mtk_smi_larb_get/put
After adding device_link between the iommu consumer and smi-larb, the pm_runtime_get(_sync) of smi-larb and smi-common will be called automatically. we can get rid of mtk_smi_larb_get/put. CC: Matthias Brugger Signed-off-by: Yong Wu Reviewed-by: Evan Green Acked-by: Krzysztof Kozlowski Acked-by: Matthias Brugger Reviewed-by: Dafna Hirschfeld Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- drivers/memory/mtk-smi.c | 14 -- include/soc/mediatek/smi.h | 20 2 files changed, 34 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index c5fb51f73b34..7c61c924e220 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -134,20 +134,6 @@ static void mtk_smi_clk_disable(const struct mtk_smi *smi) clk_disable_unprepare(smi->clk_apb); } -int mtk_smi_larb_get(struct device *larbdev) -{ - int ret = pm_runtime_resume_and_get(larbdev); - - return (ret < 0) ? ret : 0; -} -EXPORT_SYMBOL_GPL(mtk_smi_larb_get); - -void mtk_smi_larb_put(struct device *larbdev) -{ - pm_runtime_put_sync(larbdev); -} -EXPORT_SYMBOL_GPL(mtk_smi_larb_put); - static int mtk_smi_larb_bind(struct device *dev, struct device *master, void *data) { diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h index 15e3397cec58..11f7d6b59642 100644 --- a/include/soc/mediatek/smi.h +++ b/include/soc/mediatek/smi.h @@ -19,26 +19,6 @@ struct mtk_smi_larb_iommu { unsigned char bank[32]; }; -/* - * mtk_smi_larb_get: Enable the power domain and clocks for this local arbiter. - * It also initialize some basic setting(like iommu). - * mtk_smi_larb_put: Disable the power domain and clocks for this local arbiter. - * Both should be called in non-atomic context. - * - * Returns 0 if successful, negative on failure. - */ -int mtk_smi_larb_get(struct device *larbdev); -void mtk_smi_larb_put(struct device *larbdev); - -#else - -static inline int mtk_smi_larb_get(struct device *larbdev) -{ - return 0; -} - -static inline void mtk_smi_larb_put(struct device *larbdev) { } - #endif #endif -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 09/12] media: mtk-vcodec: Get rid of mtk_smi_larb_get/put
MediaTek IOMMU has already added the device_link between the consumer and smi-larb device. If the vcodec device call the pm_runtime_get_sync, the smi-larb's pm_runtime_get_sync also be called automatically. CC: Tiffany Lin CC: Irui Wang Signed-off-by: Yong Wu Reviewed-by: Evan Green Acked-by: Tiffany Lin Reviewed-by: Dafna Hirschfeld --- .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c | 37 +++- .../platform/mtk-vcodec/mtk_vcodec_drv.h | 3 -- .../platform/mtk-vcodec/mtk_vcodec_enc.c | 1 - .../platform/mtk-vcodec/mtk_vcodec_enc_pm.c | 44 +++ 4 files changed, 10 insertions(+), 75 deletions(-) diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c index 6038db96f71c..d0bf9aa3b29d 100644 --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c @@ -8,14 +8,12 @@ #include #include #include -#include #include "mtk_vcodec_dec_pm.h" #include "mtk_vcodec_util.h" int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev) { - struct device_node *node; struct platform_device *pdev; struct mtk_vcodec_pm *pm; struct mtk_vcodec_clk *dec_clk; @@ -26,18 +24,7 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev) pm = >pm; pm->mtkdev = mtkdev; dec_clk = >vdec_clk; - node = of_parse_phandle(pdev->dev.of_node, "mediatek,larb", 0); - if (!node) { - mtk_v4l2_err("of_parse_phandle mediatek,larb fail!"); - return -1; - } - pdev = of_find_device_by_node(node); - of_node_put(node); - if (WARN_ON(!pdev)) { - return -1; - } - pm->larbvdec = >dev; pdev = mtkdev->plat_dev; pm->dev = >dev; @@ -47,14 +34,11 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev) dec_clk->clk_info = devm_kcalloc(>dev, dec_clk->clk_num, sizeof(*clk_info), GFP_KERNEL); - if (!dec_clk->clk_info) { - ret = -ENOMEM; - goto put_device; - } + if (!dec_clk->clk_info) + return -ENOMEM; } else { mtk_v4l2_err("Failed to get vdec clock count"); - ret = -EINVAL; - goto put_device; + return -EINVAL; } for (i = 0; i < dec_clk->clk_num; i++) { @@ -63,29 +47,24 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev) "clock-names", i, _info->clk_name); if (ret) { mtk_v4l2_err("Failed to get clock name id = %d", i); - goto put_device; + return ret; } clk_info->vcodec_clk = devm_clk_get(>dev, clk_info->clk_name); if (IS_ERR(clk_info->vcodec_clk)) { mtk_v4l2_err("devm_clk_get (%d)%s fail", i, clk_info->clk_name); - ret = PTR_ERR(clk_info->vcodec_clk); - goto put_device; + return PTR_ERR(clk_info->vcodec_clk); } } pm_runtime_enable(>dev); return 0; -put_device: - put_device(pm->larbvdec); - return ret; } void mtk_vcodec_release_dec_pm(struct mtk_vcodec_dev *dev) { pm_runtime_disable(dev->pm.dev); - put_device(dev->pm.larbvdec); } int mtk_vcodec_dec_pw_on(struct mtk_vcodec_pm *pm) @@ -122,11 +101,6 @@ void mtk_vcodec_dec_clock_on(struct mtk_vcodec_pm *pm) } } - ret = mtk_smi_larb_get(pm->larbvdec); - if (ret) { - mtk_v4l2_err("mtk_smi_larb_get larbvdec fail %d", ret); - goto error; - } return; error: @@ -139,7 +113,6 @@ void mtk_vcodec_dec_clock_off(struct mtk_vcodec_pm *pm) struct mtk_vcodec_clk *dec_clk = >vdec_clk; int i = 0; - mtk_smi_larb_put(pm->larbvdec); for (i = dec_clk->clk_num - 1; i >= 0; i--) clk_disable_unprepare(dec_clk->clk_info[i].vcodec_clk); } diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h index c6c7672fecfb..64b73dd880ce 100644 --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h @@ -189,10 +189,7 @@ struct mtk_vcodec_clk { */ struct mtk_vcodec_pm { struct mtk_vcodec_clk vdec_clk; - struct device *larbvdec; - struct mtk_vcodec_clk venc_clk; - struct device *larbvenc; struct device *dev; struct mtk_vcodec_dev *mtkdev; }; diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c index
[PATCH v8 08/12] drm/mediatek: Get rid of mtk_smi_larb_get/put
MediaTek IOMMU has already added the device_link between the consumer and smi-larb device. If the drm device call the pm_runtime_get_sync, the smi-larb's pm_runtime_get_sync also be called automatically. CC: CK Hu CC: Philipp Zabel Signed-off-by: Yong Wu Reviewed-by: Evan Green Acked-by: Chun-Kuang Hu Reviewed-by: Dafna Hirschfeld Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 10 -- drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 36 ++--- drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 1 - drivers/gpu/drm/mediatek/mtk_drm_drv.c | 5 +-- 4 files changed, 3 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index 455ea23c6130..445c30cc823f 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -12,7 +12,6 @@ #include #include -#include #include #include @@ -643,22 +642,14 @@ static void mtk_drm_crtc_atomic_enable(struct drm_crtc *crtc, DRM_DEBUG_DRIVER("%s %d\n", __func__, crtc->base.id); - ret = mtk_smi_larb_get(comp->larb_dev); - if (ret) { - DRM_ERROR("Failed to get larb: %d\n", ret); - return; - } - ret = pm_runtime_resume_and_get(comp->dev); if (ret < 0) { - mtk_smi_larb_put(comp->larb_dev); DRM_DEV_ERROR(comp->dev, "Failed to enable power domain: %d\n", ret); return; } ret = mtk_crtc_ddp_hw_init(mtk_crtc); if (ret) { - mtk_smi_larb_put(comp->larb_dev); pm_runtime_put(comp->dev); return; } @@ -695,7 +686,6 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc *crtc, drm_crtc_vblank_off(crtc); mtk_crtc_ddp_hw_fini(mtk_crtc); - mtk_smi_larb_put(comp->larb_dev); ret = pm_runtime_put(comp->dev); if (ret < 0) DRM_DEV_ERROR(comp->dev, "Failed to disable power domain: %d\n", ret); diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c index 99cbf44463e4..48642e814370 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c @@ -414,37 +414,15 @@ unsigned int mtk_drm_find_possible_crtc_by_comp(struct drm_device *drm, return ret; } -static int mtk_ddp_get_larb_dev(struct device_node *node, struct mtk_ddp_comp *comp, - struct device *dev) -{ - struct device_node *larb_node; - struct platform_device *larb_pdev; - - larb_node = of_parse_phandle(node, "mediatek,larb", 0); - if (!larb_node) { - dev_err(dev, "Missing mediadek,larb phandle in %pOF node\n", node); - return -EINVAL; - } - - larb_pdev = of_find_device_by_node(larb_node); - if (!larb_pdev) { - dev_warn(dev, "Waiting for larb device %pOF\n", larb_node); - of_node_put(larb_node); - return -EPROBE_DEFER; - } - of_node_put(larb_node); - comp->larb_dev = _pdev->dev; - - return 0; -} - int mtk_ddp_comp_init(struct device_node *node, struct mtk_ddp_comp *comp, enum mtk_ddp_comp_id comp_id) { struct platform_device *comp_pdev; enum mtk_ddp_comp_type type; struct mtk_ddp_comp_dev *priv; +#if IS_REACHABLE(CONFIG_MTK_CMDQ) int ret; +#endif if (comp_id < 0 || comp_id >= DDP_COMPONENT_ID_MAX) return -EINVAL; @@ -460,16 +438,6 @@ int mtk_ddp_comp_init(struct device_node *node, struct mtk_ddp_comp *comp, } comp->dev = _pdev->dev; - /* Only DMA capable components need the LARB property */ - if (type == MTK_DISP_OVL || - type == MTK_DISP_OVL_2L || - type == MTK_DISP_RDMA || - type == MTK_DISP_WDMA) { - ret = mtk_ddp_get_larb_dev(node, comp, comp->dev); - if (ret) - return ret; - } - if (type == MTK_DISP_AAL || type == MTK_DISP_BLS || type == MTK_DISP_CCORR || diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h index bb914d976cf5..1b582262b682 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h @@ -70,7 +70,6 @@ struct mtk_ddp_comp_funcs { struct mtk_ddp_comp { struct device *dev; int irq; - struct device *larb_dev; enum mtk_ddp_comp_id id; const struct mtk_ddp_comp_funcs *funcs; }; diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index aec39724ebeb..c234293fc2c3 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -603,11 +603,8 @@ static int mtk_drm_probe(struct
[PATCH v8 07/12] drm/mediatek: Add pm runtime support for ovl and rdma
From: Yongqiang Niu Prepare for smi cleaning up "mediatek,larb". Display use the dispsys device to call pm_rumtime_get_sync before. This patch add pm_runtime_xx with ovl and rdma device whose nodes has "iommus" property, then display could help pm_runtime_get for smi via ovl or rdma device. CC: CK Hu Signed-off-by: Yongqiang Niu Signed-off-by: Yong Wu (Yong: Use pm_runtime_resume_and_get instead of pm_runtime_get_sync) Acked-by: Chun-Kuang Hu Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 8 +++- drivers/gpu/drm/mediatek/mtk_disp_rdma.c | 9 - drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 13 - 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c index 5326989d5206..716eac6831f2 100644 --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "mtk_disp_drv.h" @@ -414,9 +415,13 @@ static int mtk_disp_ovl_probe(struct platform_device *pdev) return ret; } + pm_runtime_enable(dev); + ret = component_add(dev, _disp_ovl_component_ops); - if (ret) + if (ret) { + pm_runtime_disable(dev); dev_err(dev, "Failed to add component: %d\n", ret); + } return ret; } @@ -424,6 +429,7 @@ static int mtk_disp_ovl_probe(struct platform_device *pdev) static int mtk_disp_ovl_remove(struct platform_device *pdev) { component_del(>dev, _disp_ovl_component_ops); + pm_runtime_disable(>dev); return 0; } diff --git a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c index 75d7f45579e2..251f034acb09 100644 --- a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c +++ b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "mtk_disp_drv.h" @@ -327,9 +328,13 @@ static int mtk_disp_rdma_probe(struct platform_device *pdev) platform_set_drvdata(pdev, priv); + pm_runtime_enable(dev); + ret = component_add(dev, _disp_rdma_component_ops); - if (ret) + if (ret) { + pm_runtime_disable(dev); dev_err(dev, "Failed to add component: %d\n", ret); + } return ret; } @@ -338,6 +343,8 @@ static int mtk_disp_rdma_remove(struct platform_device *pdev) { component_del(>dev, _disp_rdma_component_ops); + pm_runtime_disable(>dev); + return 0; } diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index 5f81489fc60c..455ea23c6130 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -649,9 +649,17 @@ static void mtk_drm_crtc_atomic_enable(struct drm_crtc *crtc, return; } + ret = pm_runtime_resume_and_get(comp->dev); + if (ret < 0) { + mtk_smi_larb_put(comp->larb_dev); + DRM_DEV_ERROR(comp->dev, "Failed to enable power domain: %d\n", ret); + return; + } + ret = mtk_crtc_ddp_hw_init(mtk_crtc); if (ret) { mtk_smi_larb_put(comp->larb_dev); + pm_runtime_put(comp->dev); return; } @@ -664,7 +672,7 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc *crtc, { struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); struct mtk_ddp_comp *comp = mtk_crtc->ddp_comp[0]; - int i; + int i, ret; DRM_DEBUG_DRIVER("%s %d\n", __func__, crtc->base.id); if (!mtk_crtc->enabled) @@ -688,6 +696,9 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc *crtc, drm_crtc_vblank_off(crtc); mtk_crtc_ddp_hw_fini(mtk_crtc); mtk_smi_larb_put(comp->larb_dev); + ret = pm_runtime_put(comp->dev); + if (ret < 0) + DRM_DEV_ERROR(comp->dev, "Failed to disable power domain: %d\n", ret); mtk_crtc->enabled = false; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 06/12] media: mtk-mdp: Get rid of mtk_smi_larb_get/put
MediaTek IOMMU has already added the device_link between the consumer and smi-larb device. If the mdp device call the pm_runtime_get_sync, the smi-larb's pm_runtime_get_sync also be called automatically. CC: Minghsiu Tsai CC: Houlong Wei Signed-off-by: Yong Wu Reviewed-by: Evan Green Reviewed-by: Houlong Wei Reviewed-by: Dafna Hirschfeld --- drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 40 --- drivers/media/platform/mtk-mdp/mtk_mdp_comp.h | 2 - drivers/media/platform/mtk-mdp/mtk_mdp_core.c | 1 - 3 files changed, 43 deletions(-) diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c index b3426a551bea..1e3833f1c9ae 100644 --- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c @@ -9,7 +9,6 @@ #include #include #include -#include #include "mtk_mdp_comp.h" @@ -18,14 +17,6 @@ void mtk_mdp_comp_clock_on(struct device *dev, struct mtk_mdp_comp *comp) { int i, err; - if (comp->larb_dev) { - err = mtk_smi_larb_get(comp->larb_dev); - if (err) - dev_err(dev, - "failed to get larb, err %d. type:%d\n", - err, comp->type); - } - for (i = 0; i < ARRAY_SIZE(comp->clk); i++) { if (IS_ERR(comp->clk[i])) continue; @@ -46,17 +37,12 @@ void mtk_mdp_comp_clock_off(struct device *dev, struct mtk_mdp_comp *comp) continue; clk_disable_unprepare(comp->clk[i]); } - - if (comp->larb_dev) - mtk_smi_larb_put(comp->larb_dev); } int mtk_mdp_comp_init(struct device *dev, struct device_node *node, struct mtk_mdp_comp *comp, enum mtk_mdp_comp_type comp_type) { - struct device_node *larb_node; - struct platform_device *larb_pdev; int ret; int i; @@ -77,32 +63,6 @@ int mtk_mdp_comp_init(struct device *dev, struct device_node *node, break; } - /* Only DMA capable components need the LARB property */ - comp->larb_dev = NULL; - if (comp->type != MTK_MDP_RDMA && - comp->type != MTK_MDP_WDMA && - comp->type != MTK_MDP_WROT) - return 0; - - larb_node = of_parse_phandle(node, "mediatek,larb", 0); - if (!larb_node) { - dev_err(dev, - "Missing mediadek,larb phandle in %pOF node\n", node); - ret = -EINVAL; - goto put_dev; - } - - larb_pdev = of_find_device_by_node(larb_node); - if (!larb_pdev) { - dev_warn(dev, "Waiting for larb device %pOF\n", larb_node); - of_node_put(larb_node); - ret = -EPROBE_DEFER; - goto put_dev; - } - of_node_put(larb_node); - - comp->larb_dev = _pdev->dev; - return 0; put_dev: diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h index 7897766c96bb..ae41dd3cd72a 100644 --- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h @@ -26,14 +26,12 @@ enum mtk_mdp_comp_type { * @node: list node to track sibing MDP components * @dev_node: component device node * @clk: clocks required for component - * @larb_dev: SMI device required for component * @type: component type */ struct mtk_mdp_comp { struct list_headnode; struct device_node *dev_node; struct clk *clk[2]; - struct device *larb_dev; enum mtk_mdp_comp_type type; }; diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c index 976aa1f4829b..70a8eab16863 100644 --- a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c @@ -17,7 +17,6 @@ #include #include #include -#include #include "mtk_mdp_core.h" #include "mtk_mdp_m2m.h" -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 05/12] media: mtk-jpeg: Get rid of mtk_smi_larb_get/put
MediaTek IOMMU has already added device_link between the consumer and smi-larb device. If the jpg device call the pm_runtime_get_sync, the smi-larb's pm_runtime_get_sync also be called automatically. After removing the larb_get operations, then mtk_jpeg_clk_init is also unnecessary. Remove it too. CC: Rick Chang CC: Xia Jiang Signed-off-by: Yong Wu Reviewed-by: Evan Green Acked-by: Rick Chang Reviewed-by: Dafna Hirschfeld Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- .../media/platform/mtk-jpeg/mtk_jpeg_core.c | 45 +-- .../media/platform/mtk-jpeg/mtk_jpeg_core.h | 2 - 2 files changed, 2 insertions(+), 45 deletions(-) diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c index a89c7b206eef..4fea2c512434 100644 --- a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c @@ -22,7 +22,6 @@ #include #include #include -#include #include "mtk_jpeg_enc_hw.h" #include "mtk_jpeg_dec_hw.h" @@ -1055,10 +1054,6 @@ static void mtk_jpeg_clk_on(struct mtk_jpeg_dev *jpeg) { int ret; - ret = mtk_smi_larb_get(jpeg->larb); - if (ret) - dev_err(jpeg->dev, "mtk_smi_larb_get larbvdec fail %d\n", ret); - ret = clk_bulk_prepare_enable(jpeg->variant->num_clks, jpeg->variant->clks); if (ret) @@ -1069,7 +1064,6 @@ static void mtk_jpeg_clk_off(struct mtk_jpeg_dev *jpeg) { clk_bulk_disable_unprepare(jpeg->variant->num_clks, jpeg->variant->clks); - mtk_smi_larb_put(jpeg->larb); } static irqreturn_t mtk_jpeg_enc_done(struct mtk_jpeg_dev *jpeg) @@ -1284,35 +1278,6 @@ static struct clk_bulk_data mtk_jpeg_clocks[] = { { .id = "jpgenc" }, }; -static int mtk_jpeg_clk_init(struct mtk_jpeg_dev *jpeg) -{ - struct device_node *node; - struct platform_device *pdev; - int ret; - - node = of_parse_phandle(jpeg->dev->of_node, "mediatek,larb", 0); - if (!node) - return -EINVAL; - pdev = of_find_device_by_node(node); - if (WARN_ON(!pdev)) { - of_node_put(node); - return -EINVAL; - } - of_node_put(node); - - jpeg->larb = >dev; - - ret = devm_clk_bulk_get(jpeg->dev, jpeg->variant->num_clks, - jpeg->variant->clks); - if (ret) { - dev_err(>dev, "failed to get jpeg clock:%d\n", ret); - put_device(>dev); - return ret; - } - - return 0; -} - static void mtk_jpeg_job_timeout_work(struct work_struct *work) { struct mtk_jpeg_dev *jpeg = container_of(work, struct mtk_jpeg_dev, @@ -1333,11 +1298,6 @@ static void mtk_jpeg_job_timeout_work(struct work_struct *work) v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); } -static inline void mtk_jpeg_clk_release(struct mtk_jpeg_dev *jpeg) -{ - put_device(jpeg->larb); -} - static int mtk_jpeg_probe(struct platform_device *pdev) { struct mtk_jpeg_dev *jpeg; @@ -1376,7 +1336,8 @@ static int mtk_jpeg_probe(struct platform_device *pdev) goto err_req_irq; } - ret = mtk_jpeg_clk_init(jpeg); + ret = devm_clk_bulk_get(jpeg->dev, jpeg->variant->num_clks, + jpeg->variant->clks); if (ret) { dev_err(>dev, "Failed to init clk, err %d\n", ret); goto err_clk_init; @@ -1442,7 +1403,6 @@ static int mtk_jpeg_probe(struct platform_device *pdev) v4l2_device_unregister(>v4l2_dev); err_dev_register: - mtk_jpeg_clk_release(jpeg); err_clk_init: @@ -1460,7 +1420,6 @@ static int mtk_jpeg_remove(struct platform_device *pdev) video_device_release(jpeg->vdev); v4l2_m2m_release(jpeg->m2m_dev); v4l2_device_unregister(>v4l2_dev); - mtk_jpeg_clk_release(jpeg); return 0; } diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h index 595f7f10c9fd..3e4811a41ba2 100644 --- a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h @@ -85,7 +85,6 @@ struct mtk_jpeg_variant { * @alloc_ctx: videobuf2 memory allocator's context * @vdev: video device node for jpeg mem2mem mode * @reg_base: JPEG registers mapping - * @larb: SMI device * @job_timeout_work: IRQ timeout structure * @variant: driver variant to be used */ @@ -99,7 +98,6 @@ struct mtk_jpeg_dev { void*alloc_ctx; struct video_device *vdev; void __iomem*reg_base; - struct device *larb; struct delayed_work job_timeout_work; const struct mtk_jpeg_variant *variant; }; -- 2.18.0 ___ iommu mailing list
[PATCH v8 04/12] iommu/mediatek: Add device_link between the consumer and the larb devices
MediaTek IOMMU-SMI diagram is like below. all the consumer connect with smi-larb, then connect with smi-common. M4U | smi-common | - | |... | | larb1 larb2 | | vdec venc When the consumer works, it should enable the smi-larb's power which also need enable the smi-common's power firstly. Thus, First of all, use the device link connect the consumer and the smi-larbs. then add device link between the smi-larb and smi-common. This patch adds device_link between the consumer and the larbs. When device_link_add, I add the flag DL_FLAG_STATELESS to avoid calling pm_runtime_xx to keep the original status of clocks. It can avoid two issues: 1) Display HW show fastlogo abnormally reported in [1]. At the beggining, all the clocks are enabled before entering kernel, but the clocks for display HW(always in larb0) will be gated after clk_enable and clk_disable called from device_link_add(->pm_runtime_resume) and rpm_idle. The clock operation happened before display driver probe. At that time, the display HW will be abnormal. 2) A deadlock issue reported in [2]. Use DL_FLAG_STATELESS to skip pm_runtime_xx to avoid the deadlock. Corresponding, DL_FLAG_AUTOREMOVE_CONSUMER can't be added, then device_link_removed should be added explicitly. [1] https://lore.kernel.org/linux-mediatek/1564213888.22908.4.camel@mhfsdcap03/ [2] https://lore.kernel.org/patchwork/patch/1086569/ Suggested-by: Tomasz Figa Signed-off-by: Yong Wu Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- drivers/iommu/mtk_iommu.c| 22 ++ drivers/iommu/mtk_iommu_v1.c | 20 +++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d5848f78a677..a2fa55899434 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -560,22 +560,44 @@ static struct iommu_device *mtk_iommu_probe_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct mtk_iommu_data *data; + struct device_link *link; + struct device *larbdev; + unsigned int larbid; if (!fwspec || fwspec->ops != _iommu_ops) return ERR_PTR(-ENODEV); /* Not a iommu client device */ data = dev_iommu_priv_get(dev); + /* +* Link the consumer device with the smi-larb device(supplier) +* The device in each a larb is a independent HW. thus only link +* one larb here. +*/ + larbid = MTK_M4U_TO_LARB(fwspec->ids[0]); + larbdev = data->larb_imu[larbid].dev; + link = device_link_add(dev, larbdev, + DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); + if (!link) + dev_err(dev, "Unable to link %s\n", dev_name(larbdev)); return >iommu; } static void mtk_iommu_release_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + struct mtk_iommu_data *data; + struct device *larbdev; + unsigned int larbid; if (!fwspec || fwspec->ops != _iommu_ops) return; + data = dev_iommu_priv_get(dev); + larbid = MTK_M4U_TO_LARB(fwspec->ids[0]); + larbdev = data->larb_imu[larbid].dev; + device_link_remove(dev, larbdev); + iommu_fwspec_free(dev); } diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 4d7809432239..e6f13459470e 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -423,7 +423,9 @@ static struct iommu_device *mtk_iommu_probe_device(struct device *dev) struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct of_phandle_args iommu_spec; struct mtk_iommu_data *data; - int err, idx = 0; + int err, idx = 0, larbid; + struct device_link *link; + struct device *larbdev; /* * In the deferred case, free the existed fwspec. @@ -453,6 +455,14 @@ static struct iommu_device *mtk_iommu_probe_device(struct device *dev) data = dev_iommu_priv_get(dev); + /* Link the consumer device with the smi-larb device(supplier) */ + larbid = mt2701_m4u_to_larb(fwspec->ids[0]); + larbdev = data->larb_imu[larbid].dev; + link = device_link_add(dev, larbdev, + DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); + if (!link) + dev_err(dev, "Unable to link %s\n", dev_name(larbdev)); + return >iommu; } @@ -473,10 +483,18 @@ static void mtk_iommu_probe_finalize(struct device *dev) static void mtk_iommu_release_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + struct mtk_iommu_data *data; + struct device *larbdev; + unsigned int larbid; if (!fwspec || fwspec->ops != _iommu_ops) return; + data =
[PATCH v8 03/12] iommu/mediatek: Add probe_defer for smi-larb
Prepare for adding device_link. The iommu consumer should use device_link to connect with the smi-larb(supplier). then the smi-larb should run before the iommu consumer. Here we delay the iommu driver until the smi driver is ready, then all the iommu consumers always are after the smi driver. When there is no this patch, if some consumer drivers run before smi-larb, the supplier link_status is DL_DEV_NO_DRIVER(0) in the device_link_add, then device_links_driver_bound will use WARN_ON to complain that the link_status of supplier is not right. device_is_bound may be more elegant here. but it is not allowed to EXPORT from https://lore.kernel.org/patchwork/patch/1334670/. Signed-off-by: Yong Wu Tested-by: Frank Wunderlich # BPI-R2/MT7623 --- drivers/iommu/mtk_iommu.c| 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d837adfd1da5..d5848f78a677 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -844,7 +844,7 @@ static int mtk_iommu_probe(struct platform_device *pdev) id = i; plarbdev = of_find_device_by_node(larbnode); - if (!plarbdev) { + if (!plarbdev || !plarbdev->dev.driver) { of_node_put(larbnode); return -EPROBE_DEFER; } diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 1467ba1e4417..4d7809432239 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -602,7 +602,7 @@ static int mtk_iommu_probe(struct platform_device *pdev) } plarbdev = of_find_device_by_node(larbnode); - if (!plarbdev) { + if (!plarbdev || !plarbdev->dev.driver) { of_node_put(larbnode); return -EPROBE_DEFER; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 02/12] iommu/mediatek-v1: Free the existed fwspec if the master dev already has
When the iommu master device enters of_iommu_xlate, the ops may be NULL(iommu dev is defered), then it will initialize the fwspec here: [] (dev_iommu_fwspec_set) from [] (iommu_fwspec_init+0xbc/0xd4) [] (iommu_fwspec_init) from [] (of_iommu_xlate+0x7c/0x12c) [] (of_iommu_xlate) from [] (of_iommu_configure+0x144/0x1e8) BUT the mtk_iommu_v1.c only supports arm32, the probing flow still is a bit weird. We always expect create the fwspec internally. otherwise it will enter here and return fail. static int mtk_iommu_create_mapping(struct device *dev, struct of_phandle_args *args) { ... if (!fwspec) { } else if (dev_iommu_fwspec_get(dev)->ops != _iommu_ops) { >>Enter here. return fail. return -EINVAL; } ... } Thus, Free the existed fwspec if the master device already has fwspec. This issue is reported at: https://lore.kernel.org/linux-mediatek/trinity-7d9ebdc9-4849-4d93-bfb5-429dcb4ee449-1626253158870@3c-app-gmx-bs01/ Reported-by: Frank Wunderlich Signed-off-by: Yong Wu --- drivers/iommu/mtk_iommu_v1.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index be22fcf988ce..1467ba1e4417 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -425,6 +425,15 @@ static struct iommu_device *mtk_iommu_probe_device(struct device *dev) struct mtk_iommu_data *data; int err, idx = 0; + /* +* In the deferred case, free the existed fwspec. +* Always initialize the fwspec internally. +*/ + if (fwspec) { + iommu_fwspec_free(dev); + fwspec = dev_iommu_fwspec_get(dev); + } + while (!of_parse_phandle_with_args(dev->of_node, "iommus", "#iommu-cells", idx, _spec)) { -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v8 01/12] dt-binding: mediatek: Get rid of mediatek, larb for multimedia HW
After adding device_link between the consumer with the smi-larbs, if the consumer call its owner pm_runtime_get(_sync), the pm_runtime_get(_sync) of smi-larb and smi-common will be called automatically. Thus, the consumer don't need the property. And IOMMU also know which larb this consumer connects with from iommu id in the "iommus=" property. Signed-off-by: Yong Wu Reviewed-by: Rob Herring Reviewed-by: Evan Green --- .../bindings/display/mediatek/mediatek,disp.txt | 9 - .../devicetree/bindings/media/mediatek-jpeg-decoder.yaml | 9 - .../devicetree/bindings/media/mediatek-jpeg-encoder.yaml | 9 - Documentation/devicetree/bindings/media/mediatek-mdp.txt | 8 .../devicetree/bindings/media/mediatek-vcodec.txt| 4 5 files changed, 39 deletions(-) diff --git a/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt b/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt index fbb59c9ddda6..867bd82e2f03 100644 --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt @@ -61,8 +61,6 @@ Required properties (DMA function blocks): "mediatek,-disp-rdma" "mediatek,-disp-wdma" the supported chips are mt2701, mt8167 and mt8173. -- larb: Should contain a phandle pointing to the local arbiter device as defined - in Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml - iommus: Should point to the respective IOMMU block with master port as argument, see Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml for details. @@ -91,7 +89,6 @@ ovl0: ovl@1400c000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_OVL0>; iommus = < M4U_PORT_DISP_OVL0>; - mediatek,larb = <>; }; ovl1: ovl@1400d000 { @@ -101,7 +98,6 @@ ovl1: ovl@1400d000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_OVL1>; iommus = < M4U_PORT_DISP_OVL1>; - mediatek,larb = <>; }; rdma0: rdma@1400e000 { @@ -111,7 +107,6 @@ rdma0: rdma@1400e000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA0>; iommus = < M4U_PORT_DISP_RDMA0>; - mediatek,larb = <>; mediatek,rdma-fifosize = <8192>; }; @@ -122,7 +117,6 @@ rdma1: rdma@1400f000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA1>; iommus = < M4U_PORT_DISP_RDMA1>; - mediatek,larb = <>; }; rdma2: rdma@1401 { @@ -132,7 +126,6 @@ rdma2: rdma@1401 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_RDMA2>; iommus = < M4U_PORT_DISP_RDMA2>; - mediatek,larb = <>; }; wdma0: wdma@14011000 { @@ -142,7 +135,6 @@ wdma0: wdma@14011000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_WDMA0>; iommus = < M4U_PORT_DISP_WDMA0>; - mediatek,larb = <>; }; wdma1: wdma@14012000 { @@ -152,7 +144,6 @@ wdma1: wdma@14012000 { power-domains = < MT8173_POWER_DOMAIN_MM>; clocks = < CLK_MM_DISP_WDMA1>; iommus = < M4U_PORT_DISP_WDMA1>; - mediatek,larb = <>; }; color0: color@14013000 { diff --git a/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml b/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml index 9b87f036f178..052e752157b4 100644 --- a/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml +++ b/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml @@ -42,13 +42,6 @@ properties: power-domains: maxItems: 1 - mediatek,larb: -$ref: '/schemas/types.yaml#/definitions/phandle' -description: | - Must contain the local arbiters in the current Socs, see - Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml - for details. - iommus: maxItems: 2 description: | @@ -63,7 +56,6 @@ required: - clocks - clock-names - power-domains - - mediatek,larb - iommus additionalProperties: false @@ -83,7 +75,6 @@ examples: clock-names = "jpgdec-smi", "jpgdec"; power-domains = < MT2701_POWER_DOMAIN_ISP>; - mediatek,larb = <>; iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>, < MT2701_M4U_PORT_JPGDEC_BSDMA>; }; diff --git a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml index fcd9b829e036..8bfdfdfaba59 100644 --- a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml +++ b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml @@ -35,13 +35,6 @@ properties: power-domains: maxItems: 1 - mediatek,larb: -$ref: '/schemas/types.yaml#/definitions/phandle' -description: | - Must contain the local arbiters in the current Socs, see -
[PATCH v8 00/12] Clean up "mediatek,larb"
MediaTek IOMMU block diagram always like below: M4U | smi-common | - | | ... | | larb1 larb2 | | vdec venc All the consumer connect with smi-larb, then connect with smi-common. When the consumer works, it should enable the smi-larb's power which also need enable the smi-common's power firstly. Thus, Firstly, use the device link connect the consumer and the smi-larbs. then add device link between the smi-larb and smi-common. After adding the device_link, then "mediatek,larb" property can be removed. the iommu consumer don't need call the mtk_smi_larb_get/put to enable the power and clock of smi-larb and smi-common. About the MM dt-binding/dtsi patches, I guess they should go together, thus I don't split them for each a MM module and each a SoC. Base on a jpeg dtbing patchset[1] that already got the necessary R-b. [1] https://lore.kernel.org/linux-mediatek/20210702102304.3346429-1-hsi...@chromium.org/ Change notes: v8: 1) Rebase on v5.15-rc1. 2) Don't rebase the below mdp patchset that may still need more discuss. https://lore.kernel.org/linux-mediatek/20210709022324.1607884-1-ei...@chromium.org/ 3) Add Frank's Tested-by. Remove Dafna's Tested-by as he requested. v7: https://lore.kernel.org/linux-mediatek/20210730025238.22456-1-yong...@mediatek.com/ 1) Fix a arm32 boot fail issue. reported from Frank. 2) Add a return fail in the mtk drm. suggested by Dafna. v6: https://lore.kernel.org/linux-mediatek/20210714025626.5528-1-yong...@mediatek.com/ 1) rebase on v5.14-rc1. 2) Fix the issue commented in v5 from Dafna and Hsin-Yi. 3) Remove the patches about using pm_runtime_resume_and_get since they have already been merged by other patches. v5: https://lore.kernel.org/linux-mediatek/20210410091128.31823-1-yong...@mediatek.com/ 1) Base v5.12-rc2. 2) Remove changing the mtk-iommu to module_platform_driver patch, It have already been a independent patch. v4: https://lore.kernel.org/linux-mediatek/1590826218-23653-1-git-send-email-yong...@mediatek.com/ base on v5.7-rc1. 1) Move drm PM patch before smi patchs. 2) Change builtin_platform_driver to module_platform_driver since we may need build as module. 3) Rebase many patchset as above. v3: https://lore.kernel.org/linux-iommu/1567503456-24725-1-git-send-email-yong...@mediatek.com/ 1) rebase on v5.3-rc1 and the latest mt8183 patchset. 2) Use device_is_bound to check whether the driver is ready from Matthias. 3) Add DL_FLAG_STATELESS flag when calling device_link_add and explain the reason in the commit message[3/14]. 4) Add a display patch[12/14] into this series. otherwise it may affect display HW fastlogo even though it don't happen in mt8183. v2: https://lore.kernel.org/linux-iommu/1560171313-28299-1-git-send-email-yong...@mediatek.com/ 1) rebase on v5.2-rc1. 2) Move adding device_link between the consumer and smi-larb into iommu_add_device from Robin. 3) add DL_FLAG_AUTOREMOVE_CONSUMER even though the smi is built-in from Evan. 4) Remove the shutdown callback in iommu. v1: https://lore.kernel.org/linux-iommu/1546318276-18993-1-git-send-email-yong...@mediatek.com/ Yong Wu (11): dt-binding: mediatek: Get rid of mediatek, larb for multimedia HW iommu/mediatek-v1: Free the existed fwspec if the master dev already has iommu/mediatek: Add probe_defer for smi-larb iommu/mediatek: Add device_link between the consumer and the larb devices media: mtk-jpeg: Get rid of mtk_smi_larb_get/put media: mtk-mdp: Get rid of mtk_smi_larb_get/put drm/mediatek: Get rid of mtk_smi_larb_get/put media: mtk-vcodec: Get rid of mtk_smi_larb_get/put memory: mtk-smi: Get rid of mtk_smi_larb_get/put arm: dts: mediatek: Get rid of mediatek,larb for MM nodes arm64: dts: mediatek: Get rid of mediatek,larb for MM nodes Yongqiang Niu (1): drm/mediatek: Add pm runtime support for ovl and rdma .../display/mediatek/mediatek,disp.txt| 9 .../bindings/media/mediatek-jpeg-decoder.yaml | 9 .../bindings/media/mediatek-jpeg-encoder.yaml | 9 .../bindings/media/mediatek-mdp.txt | 8 .../bindings/media/mediatek-vcodec.txt| 4 -- arch/arm/boot/dts/mt2701.dtsi | 2 - arch/arm/boot/dts/mt7623n.dtsi| 5 --- arch/arm64/boot/dts/mediatek/mt8173.dtsi | 16 --- arch/arm64/boot/dts/mediatek/mt8183.dtsi | 6 --- drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 8 +++- drivers/gpu/drm/mediatek/mtk_disp_rdma.c | 9 +++- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 15 --- drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 36 +-- drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 1 - drivers/gpu/drm/mediatek/mtk_drm_drv.c| 5 +-- drivers/iommu/mtk_iommu.c | 24 +- drivers/iommu/mtk_iommu_v1.c | 31
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
Hi, Tony, On Tue, Sep 28, 2021 at 06:06:52PM -0700, Luck, Tony wrote: > >>fpregs_lock(); > > > > I'm afraid we may hit the same locking issue when we send IPI to notify > > another task to modify its > > PASID state. Here the API is called to modify another running task's PASID > > state as well without a right lock. > > fpregs_lock() is not enough to deal with this, I'm afraid. > > We don't send IPI any more to change PASID state. The only place that the > current patch series touches the PASID MSR is in the #GP fault handler. It's safe for the helpers to handle the PASID case (modifying the current task's PASID state in #GP). But the helpers seem to be generic. They take "task" as a parameter and handle the task as non-current case. So the helpers are not for PASID only. They may be used by others to modify a running task's FPU state. But It's not safe to do so. At least need some comments/restriction for the helpers to be used on a running task? Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
>> fpregs_lock(); > > I'm afraid we may hit the same locking issue when we send IPI to notify > another task to modify its > PASID state. Here the API is called to modify another running task's PASID > state as well without a right lock. > fpregs_lock() is not enough to deal with this, I'm afraid. We don't send IPI any more to change PASID state. The only place that the current patch series touches the PASID MSR is in the #GP fault handler. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Jason Gunthorpe > Sent: Tuesday, September 28, 2021 10:07 PM > > On Tue, Sep 28, 2021 at 09:35:05PM +0800, Lu Baolu wrote: > > Another issue is, when putting a device into user-dma mode, all devices > > belonging to the same iommu group shouldn't be bound with a kernel-dma > > driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is > > not lock safe as discussed below, > > > > https://lore.kernel.org/linux- > iommu/20210927130935.gz964...@nvidia.com/ > > > > Any guidance on this? > > Something like this? > > yes, with this group level atomics we don't need loop every dev->driver respectively. > int iommu_set_device_dma_owner(struct device *dev, enum > device_dma_owner mode, > struct file *user_owner) > { > struct iommu_group *group = group_from_dev(dev); > > spin_lock(_group->dma_owner_lock); > switch (mode) { > case DMA_OWNER_KERNEL: > if (iommu_group- > >dma_users[DMA_OWNER_USERSPACE]) > return -EBUSY; > break; > case DMA_OWNER_SHARED: > break; > case DMA_OWNER_USERSPACE: > if (iommu_group- > >dma_users[DMA_OWNER_KERNEL]) > return -EBUSY; > if (iommu_group->dma_owner_file != user_owner) { > if (iommu_group- > >dma_users[DMA_OWNER_USERSPACE]) > return -EPERM; > get_file(user_owner); > iommu_group->dma_owner_file = > user_owner; > } > break; > default: > spin_unlock(_group->dma_owner_lock); > return -EINVAL; > } > iommu_group->dma_users[mode]++; > spin_unlock(_group->dma_owner_lock); > return 0; > } > > int iommu_release_device_dma_owner(struct device *dev, > enum device_dma_owner mode) > { > struct iommu_group *group = group_from_dev(dev); > > spin_lock(_group->dma_owner_lock); > if (WARN_ON(!iommu_group->dma_users[mode])) > goto err_unlock; > if (!iommu_group->dma_users[mode]--) { > if (mode == DMA_OWNER_USERSPACE) { > fput(iommu_group->dma_owner_file); > iommu_group->dma_owner_file = NULL; > } > } > err_unlock: > spin_unlock(_group->dma_owner_lock); > } > > > Where, the driver core does before probe: > >iommu_set_device_dma_owner(dev, DMA_OWNER_KERNEL, NULL) > > pci_stub/etc does in their probe func: > >iommu_set_device_dma_owner(dev, DMA_OWNER_SHARED, NULL) > > And vfio/iommfd does when a struct vfio_device FD is attached: > >iommu_set_device_dma_owner(dev, DMA_OWNER_USERSPACE, > group_file/iommu_file) > Just a nit. Per your comment in previous mail: /* If set the driver must call iommu_XX as the first action in probe() */ bool suppress_dma_owner:1; Following above logic userspace drivers won't call iommu_XX in probe(). Just want to double confirm whether you see any issue here with this relaxed behavior. If no problem: /* If set the driver must call iommu_XX as the first action in probe() or * before it attempts to do DMA */ bool suppress_dma_owner:1; Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
Hi, Tony, > void *begin_update_one_xsave_feature(struct task_struct *tsk, >enum xfeature xfeature, bool full) { > struct xregs_state *xsave = >thread.fpu.state.xsave; > struct xregs_state *xinit = _fpstate.xsave; > u64 fmask = 1ull << xfeature; > void *addr; > > BUG_ON(!(xsave->header.xcomp_bv & fmask)); > > fpregs_lock(); I'm afraid we may hit the same locking issue when we send IPI to notify another task to modify its PASID state. Here the API is called to modify another running task's PASID state as well without a right lock. fpregs_lock() is not enough to deal with this, I'm afraid. Quote from Thomas in https://lore.kernel.org/linux-iommu/87mtsd6gr9@nanos.tec.linutronix.de/ "FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true." Maybe restrict the API's scope: don't modify another task's FPU state, just the current task's state? > addr = __raw_xsave_addr(xsave, xfeature); > > if (full || tsk != current) { > memcpy(addr, __raw_xsave_addr(xinit, xfeature), > xstate_sizes[xfeature]); > goto out; > } > > if (!(xsave->header.xfeatures & fmask)) { > xsave->header.xfeatures |= fmask; //< > xsaves(xsave, fmask); > } > > out: > xsave->header.xfeatures |= fmask; > return addr; > } > > void finish_update_one_xsave_feature(struct task_struct *tsk) { > set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD); > if (tsk == current) //< > __cpu_invalidate_fpregs_state();//< > fpregs_unlock(); > } Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On Tue, Sep 28, 2021 at 11:50:37PM +, Fenghua Yu wrote: > If xfeatures's feature bit is 0, xsaves will not write its init value to the > memory due to init optimization. So the xsaves will do nothing and the > state is not initialized and may have random data. > Setting TIF_NEED_FPU_LOAD cannot guaranteed to execute XRSTORS on exiting > to user. In fpregs_restore_userregs(): > if (!fpregs_state_valid(fpu, cpu)) { > ... > __restore_fpregs_from_fpstate(>state, mask); > ... > } > > fpregs state should be invalid to get the XRSTROS executed. > > So setting TIF_NEED_FPU_LOAD may get the FPU register unchanged on exiting > to user. Does this help? Changed lines marked with //< -Tony void *begin_update_one_xsave_feature(struct task_struct *tsk, enum xfeature xfeature, bool full) { struct xregs_state *xsave = >thread.fpu.state.xsave; struct xregs_state *xinit = _fpstate.xsave; u64 fmask = 1ull << xfeature; void *addr; BUG_ON(!(xsave->header.xcomp_bv & fmask)); fpregs_lock(); addr = __raw_xsave_addr(xsave, xfeature); if (full || tsk != current) { memcpy(addr, __raw_xsave_addr(xinit, xfeature), xstate_sizes[xfeature]); goto out; } if (!(xsave->header.xfeatures & fmask)) { xsave->header.xfeatures |= fmask; //< xsaves(xsave, fmask); } out: xsave->header.xfeatures |= fmask; return addr; } void finish_update_one_xsave_feature(struct task_struct *tsk) { set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD); if (tsk == current) //< __cpu_invalidate_fpregs_state();//< fpregs_unlock(); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
Hi, Tony, On Tue, Sep 28, 2021 at 04:10:39PM -0700, Luck, Tony wrote: > Moving beyond pseudo-code and into compiles-but-probably-broken-code. > > > The intent of the functions below is that Fenghua should be able to > do: > > void fpu__pasid_write(u32 pasid) > { > u64 msr_val = pasid | MSR_IA32_PASID_VALID; > struct ia32_pasid_state *addr; > > addr = begin_update_one_xsave_feature(current, XFEATURE_PASID, true); > addr->pasid = msr_val; > finish_update_one_xsave_feature(current); > } > > So here's the two new functions that would be added to > arch/x86/kernel/fpu/xstate.c > > > > void *begin_update_one_xsave_feature(struct task_struct *tsk, > enum xfeature xfeature, bool full) > { > struct xregs_state *xsave = >thread.fpu.state.xsave; > struct xregs_state *xinit = _fpstate.xsave; > u64 fmask = 1ull << xfeature; > void *addr; > > BUG_ON(!(xsave->header.xcomp_bv & fmask)); > > fpregs_lock(); > > addr = __raw_xsave_addr(xsave, xfeature); > > if (full || tsk != current) { > memcpy(addr, __raw_xsave_addr(xinit, xfeature), > xstate_sizes[xfeature]); > goto out; > } > > /* could optimize some cases where xsaves() isn't fastest option */ > if (!(xsave->header.xfeatures & fmask)) > xsaves(xsave, fmask); If xfeatures's feature bit is 0, xsaves will not write its init value to the memory due to init optimization. So the xsaves will do nothing and the state is not initialized and may have random data. > > out: > xsave->header.xfeatures |= fmask; > return addr; > } > > void finish_update_one_xsave_feature(struct task_struct *tsk) > { > set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD); Setting TIF_NEED_FPU_LOAD cannot guaranteed to execute XRSTORS on exiting to user. In fpregs_restore_userregs(): if (!fpregs_state_valid(fpu, cpu)) { ... __restore_fpregs_from_fpstate(>state, mask); ... } fpregs state should be invalid to get the XRSTROS executed. So setting TIF_NEED_FPU_LOAD may get the FPU register unchanged on exiting to user. > fpregs_unlock(); > } Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
Moving beyond pseudo-code and into compiles-but-probably-broken-code. The intent of the functions below is that Fenghua should be able to do: void fpu__pasid_write(u32 pasid) { u64 msr_val = pasid | MSR_IA32_PASID_VALID; struct ia32_pasid_state *addr; addr = begin_update_one_xsave_feature(current, XFEATURE_PASID, true); addr->pasid = msr_val; finish_update_one_xsave_feature(current); } So here's the two new functions that would be added to arch/x86/kernel/fpu/xstate.c void *begin_update_one_xsave_feature(struct task_struct *tsk, enum xfeature xfeature, bool full) { struct xregs_state *xsave = >thread.fpu.state.xsave; struct xregs_state *xinit = _fpstate.xsave; u64 fmask = 1ull << xfeature; void *addr; BUG_ON(!(xsave->header.xcomp_bv & fmask)); fpregs_lock(); addr = __raw_xsave_addr(xsave, xfeature); if (full || tsk != current) { memcpy(addr, __raw_xsave_addr(xinit, xfeature), xstate_sizes[xfeature]); goto out; } /* could optimize some cases where xsaves() isn't fastest option */ if (!(xsave->header.xfeatures & fmask)) xsaves(xsave, fmask); out: xsave->header.xfeatures |= fmask; return addr; } void finish_update_one_xsave_feature(struct task_struct *tsk) { set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD); fpregs_unlock(); } -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH][next] iommu/dma: Use kvcalloc() instead of kvzalloc()
Use 2-factor argument form kvcalloc() instead of kvzalloc(). Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Gustavo A. R. Silva --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 896bea04c347..18c6edbe5fbf 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -616,7 +616,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, if (!order_mask) return NULL; - pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL); + pages = kvcalloc(count, sizeof(*pages), GFP_KERNEL); if (!pages) return NULL; -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 4/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote: > +enum pci_p2pdma_map_type > +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > +struct scatterlist *sg) > +{ > + if (state->pgmap != sg_page(sg)->pgmap) { > + state->pgmap = sg_page(sg)->pgmap; This has built into it an assumption that every page in the sg element has the same pgmap, but AFAIK nothing enforces this rule? There is no requirement that the HW has pfn gaps between the pgmaps linux decides to create over it. At least sg_alloc_append_table_from_pages() and probably something in the block world should be updated to not combine struct pages with different pgmaps, and this should be documented in scatterlist.* someplace. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On Tue, Sep 28, 2021 at 02:01:57PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Yes. But, since the check is related to TDX, I just want to confirm whether > you are fine with naming the function as intel_*(). Why is this such a big of a deal?! There's amd_cc_platform_has() and intel_cc_platform_has() will be the corresponding Intel version. > Since this patch is going to have dependency on TDX code, I will include > this patch in TDX patch set. Ok. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 1:58 PM, Borislav Petkov wrote: On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote: Just read it. If you want to use cpuid_has_tdx_guest() directly in cc_platform_has(), then you want to rename intel_cc_platform_has() to tdx_cc_platform_has()? Why? You simply do: if (cpuid_has_tdx_guest()) intel_cc_platform_has(...); and lemme paste from that mail: " ...you should use cpuid_has_tdx_guest() instead but cache its result so that you don't call CPUID each time the kernel executes cc_platform_has()." Makes sense? Yes. But, since the check is related to TDX, I just want to confirm whether you are fine with naming the function as intel_*(). Since this patch is going to have dependency on TDX code, I will include this patch in TDX patch set. -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Just read it. If you want to use cpuid_has_tdx_guest() directly in > cc_platform_has(), then you want to rename intel_cc_platform_has() to > tdx_cc_platform_has()? Why? You simply do: if (cpuid_has_tdx_guest()) intel_cc_platform_has(...); and lemme paste from that mail: " ...you should use cpuid_has_tdx_guest() instead but cache its result so that you don't call CPUID each time the kernel executes cc_platform_has()." Makes sense? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On 9/28/21 1:28 PM, Luck, Tony wrote: > On Tue, Sep 28, 2021 at 12:19:22PM -0700, Dave Hansen wrote: >> On 9/28/21 11:50 AM, Luck, Tony wrote: >>> On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote: >> ... 1. Hide whether we need to write to real registers 2. Hide whether we need to update the in-memory image 3. Hide other FPU infrastructure like the TIF flag. 4. Make the users deal with a *whole* state in the replace API >>> >>> Is that difference just whether you need to save the >>> state from registers to memory (for the "update" case) >>> or not (for the "replace" case ... where you can ignore >>> the current register, overwrite the whole per-feature >>> xsave area and mark it to be restored to registers). >>> >>> If so, just a "bool full" argument might do the trick? >> >> I want to be able to hide the complexity of where the old state comes >> from. It might be in registers or it might be in memory or it might be >> *neither*. It's possible we're running with stale register state and a >> current->...->xsave buffer that has XFEATURES_FOO 0. >> >> In that case, the "old" copy might be memcpy'd out of the init_task. >> Or, for pkeys, we might build it ourselves with init_pkru_val. > > So should there be an error case if there isn't an "old" state, and > the user calls: > > p = begin_update_one_xsave_feature(XFEATURE_something, false); > > Maybe instead of an error, just fill it in with the init state for the > feature? Yes, please. Let's not generate an error. Let's populate the init state and let them move on with life. >>> pseudo-code: >>> >>> void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full) >>> { >>> void *addr; >>> >>> BUG_ON(!(xsave->header.xcomp_bv & xfeature)); >>> >>> addr = __raw_xsave_addr(xsave, xfeature); >>> >>> fpregs_lock(); >>> >>> if (full) >>> return addr; >> >> If the feature is marked as in the init state in the buffer >> (XSTATE_BV[feature]==0), this addr *could* contain total garbage. So, >> we'd want to make sure that the memory contents have the init state >> written before handing them back to the caller. That's not strictly >> required if the user is writing the whole thing, but it's the nice thing >> to do. > > Nice guys waste CPU cycles writing to memory that is just going to get > written again. Let's skip the "replace" operation for now and focus on "update". A full replace *can* be faster because it doesn't require the state to be written out. But, we don't need that for now. Let's just do the "update" thing, and let's ensure that we reflect the init state into the buffer that gets returned if the feature was in its init state. Sound good? >>> if (xfeature registers are "live") >>> xsaves(xstate, 1 << xfeature); >> >> One little note: I don't think we would necessarily need to do an XSAVES >> here. For PKRU, for instance, we could just do a rdpkru. > > Like this? > > if (tsk == current) { > switch (xfeature) { > case XFEATURE_PKRU: > *(u32 *)addr = rdpkru(); > break; > case XFEATURE_PASID: > rdmsrl(MSR_IA32_PASID, msr); > *(u64 *)addr = msr; > break; > ... any other "easy" states ... > default: > xsaves(xstate, 1 << xfeature); > break; > } > } Yep. >>> return addr; >>> } >>> >>> void finish_update_one_xsave_feature(enum xfeature xfeature) >>> { >>> mark feature modified >> >> I think we'd want to do this at the "begin" time. Also, do you mean we >> should set XSTATE_BV[feature]? > > Begin? End? It's all inside fpregs_lock(). But whatever seems best. I'm actually waffling about it. Does XSTATE_BV[feature] mean that state *is* there or that state *may* be there? Either way works. >> It's also worth noting that we *could*: >> >> xrstors(xstate, 1<> >> as well. That would bring the registers back up to day and we could >> keep TIF_NEED_FPU_LOAD==0. > > Only makes sense if "tsk == current". But does this help. The work seems > to be the same whether we do it now, or later. We don't know for sure > that we will directly return to the task. We might context switch to > another task, so loading the state into registers now would just be > wasted time. True, but the other side of the coin is that setting TIF_NEED_FPU_LOAD subjects us to an XRSTOR on the way out to userspace. That XRSTOR might just be updating one state component in practice. Either way, sorry for the distraction. We (me) should really be focusing on getting something that is functional but slow. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 1:31 PM, Borislav Petkov wrote: On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote: Intel CC support patch is not included in this series. You want me to address the issue raised by Joerg before merging it? Did you not see my email to you today: https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic Just read it. If you want to use cpuid_has_tdx_guest() directly in cc_platform_has(), then you want to rename intel_cc_platform_has() to tdx_cc_platform_has()? ? -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Intel CC support patch is not included in this series. You want me > to address the issue raised by Joerg before merging it? Did you not see my email to you today: https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic ? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On Tue, Sep 28, 2021 at 12:19:22PM -0700, Dave Hansen wrote: > On 9/28/21 11:50 AM, Luck, Tony wrote: > > On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote: > ... > >> 1. Hide whether we need to write to real registers > >> 2. Hide whether we need to update the in-memory image > >> 3. Hide other FPU infrastructure like the TIF flag. > >> 4. Make the users deal with a *whole* state in the replace API > > > > Is that difference just whether you need to save the > > state from registers to memory (for the "update" case) > > or not (for the "replace" case ... where you can ignore > > the current register, overwrite the whole per-feature > > xsave area and mark it to be restored to registers). > > > > If so, just a "bool full" argument might do the trick? > > I want to be able to hide the complexity of where the old state comes > from. It might be in registers or it might be in memory or it might be > *neither*. It's possible we're running with stale register state and a > current->...->xsave buffer that has XFEATURES_FOO 0. > > In that case, the "old" copy might be memcpy'd out of the init_task. > Or, for pkeys, we might build it ourselves with init_pkru_val. So should there be an error case if there isn't an "old" state, and the user calls: p = begin_update_one_xsave_feature(XFEATURE_something, false); Maybe instead of an error, just fill it in with the init state for the feature? > > Also - you have a "tsk" argument in your pseudo code. Is > > this needed? Are there places where we need to perform > > these operations on something other than "current"? > > Two cases come to mind: > 1. Fork/clone where we are doing things to our child's XSAVE buffer > 2. ptrace() where we are poking into another task's state > > ptrace() goes for the *whole* buffer now. I'm not sure it would need > this per-feature API. I just call it out as something that we might > need in the future. Ok - those seem ok ... it is up to the caller to make sure that the target task is in some "not running, and can't suddenly start running" state before calling these functions. > > > pseudo-code: > > > > void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full) > > { > > void *addr; > > > > BUG_ON(!(xsave->header.xcomp_bv & xfeature)); > > > > addr = __raw_xsave_addr(xsave, xfeature); > > > > fpregs_lock(); > > > > if (full) > > return addr; > > If the feature is marked as in the init state in the buffer > (XSTATE_BV[feature]==0), this addr *could* contain total garbage. So, > we'd want to make sure that the memory contents have the init state > written before handing them back to the caller. That's not strictly > required if the user is writing the whole thing, but it's the nice thing > to do. Nice guys waste CPU cycles writing to memory that is just going to get written again. > > > if (xfeature registers are "live") > > xsaves(xstate, 1 << xfeature); > > One little note: I don't think we would necessarily need to do an XSAVES > here. For PKRU, for instance, we could just do a rdpkru. Like this? if (tsk == current) { switch (xfeature) { case XFEATURE_PKRU: *(u32 *)addr = rdpkru(); break; case XFEATURE_PASID: rdmsrl(MSR_IA32_PASID, msr); *(u64 *)addr = msr; break; ... any other "easy" states ... default: xsaves(xstate, 1 << xfeature); break; } } > > > return addr; > > } > > > > void finish_update_one_xsave_feature(enum xfeature xfeature) > > { > > mark feature modified > > I think we'd want to do this at the "begin" time. Also, do you mean we > should set XSTATE_BV[feature]? Begin? End? It's all inside fpregs_lock(). But whatever seems best. Yes, I think that this means set XSTATE_BV[feature] ... but I'm relying on you as the xsave expert to help get the subtle bits right so the Andy Lutomirski can smile at this code. > > set TIF bit > > Since the XSAVE buffer was updated, it now contains the canonical FPU > state. It may have diverged from the register state, thus we need to > set TIF_NEED_FPU_LOAD. Yes, that's the TIF bit my pseudo-code intended. > It's also worth noting that we *could*: > > xrstors(xstate, 1< > as well. That would bring the registers back up to day and we could > keep TIF_NEED_FPU_LOAD==0. Only makes sense if "tsk == current". But does this help. The work seems to be the same whether we do it now, or later. We don't know for sure that we will directly return to the task. We might context switch to another task, so loading the state into registers now would just be wasted time. > > > fpregs_unlock(); > > } -Tony ___ iommu mailing list
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote: > +static void pci_p2pdma_unmap_mappings(void *data) > +{ > + struct pci_dev *pdev = data; > + struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); > + > + p2pdma->active = false; > + synchronize_rcu(); > + unmap_mapping_range(p2pdma->inode->i_mapping, 0, 0, 1); > + pci_p2pdma_free_mappings(p2pdma->inode->i_mapping); > +} If this is going to rely on unmap_mapping_range then GUP should also reject this memory for FOLL_LONGTERM.. What along this control flow: > + error = devm_add_action_or_reset(>dev, > pci_p2pdma_unmap_mappings, > +pdev); Waits for all the page refcounts to go to zero? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 00/20] Userspace P2PDMA with O_DIRECT NVMe devices
On Thu, Sep 16, 2021 at 05:40:40PM -0600, Logan Gunthorpe wrote: > Hi, > > This patchset continues my work to add userspace P2PDMA access using > O_DIRECT NVMe devices. My last posting[1] just included the first 13 > patches in this series, but the early P2PDMA cleanup and map_sg error > changes from that series have been merged into v5.15-rc1. To address > concerns that that series did not add any new functionality, I've added > back the userspcae functionality from the original RFC[2] (but improved > based on the original feedback). I really think this is the best series yet, it really looks nice overall. I know the sg flag was a bit of a debate at the start, but it serves an undeniable purpose and the resulting standard DMA APIs 'just working' is really clean. There is more possible here, we could also pass the new GUP flag in the ib_umem code.. After this gets merged I would make a series to split out the CMD genalloc related stuff and try and probably get something like VFIO to export this kind of memory as well, then it would have pretty nice coverage. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote: > +int pci_mmap_p2pmem(struct pci_dev *pdev, struct vm_area_struct *vma) > +{ > + struct pci_p2pdma_map *pmap; > + struct pci_p2pdma *p2pdma; > + int ret; > + > + /* prevent private mappings from being established */ > + if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) { > + pci_info_ratelimited(pdev, > + "%s: fail, attempted private mapping\n", > + current->comm); > + return -EINVAL; > + } > + > + pmap = pci_p2pdma_map_alloc(pdev, vma->vm_end - vma->vm_start); > + if (!pmap) > + return -ENOMEM; > + > + rcu_read_lock(); > + p2pdma = rcu_dereference(pdev->p2pdma); > + if (!p2pdma) { > + ret = -ENODEV; > + goto out; > + } > + > + ret = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt, > + _p2pdma_fs_cnt); > + if (ret) > + goto out; > + > + ihold(p2pdma->inode); > + pmap->inode = p2pdma->inode; > + rcu_read_unlock(); > + > + vma->vm_flags |= VM_MIXEDMAP; Why is this a VM_MIXEDMAP? Everything fault sticks in here has a struct page, right? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 14/20] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
On Thu, Sep 16, 2021 at 05:40:54PM -0600, Logan Gunthorpe wrote: > Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to > allow obtaining P2PDMA pages. If a caller does not set this flag > and tries to map P2PDMA pages it will fail. > > This is implemented by adding a flag and a check to get_dev_pagemap(). I would like to see the get_dev_pagemap() deleted from GUP in the first place. Why isn't this just a simple check of the page->pgmap type after acquiring a valid page reference? See my prior note Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg()
On Thu, Sep 16, 2021 at 05:40:53PM -0600, Logan Gunthorpe wrote: > This interface is superseded by support in dma_map_sg() which now supports > heterogeneous scatterlists. There are no longer any users, so remove it. > > Signed-off-by: Logan Gunthorpe > --- > drivers/pci/p2pdma.c | 65 -- > include/linux/pci-p2pdma.h | 27 > 2 files changed, 92 deletions(-) Reviewed-by: Jason Gunthorpe Good riddance :) Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 12/20] RDMA/rw: use dma_map_sgtable()
On Thu, Sep 16, 2021 at 05:40:52PM -0600, Logan Gunthorpe wrote: > dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg() > is no longer necessary and may be dropped. > > Switch to the dma_map_sgtable() interface which will allow for better > error reporting if the P2PDMA pages are unsupported. > > The change to sgtable also appears to fix a couple subtle error path > bugs: > > - In rdma_rw_ctx_init(), dma_unmap would be called with an sg > that could have been incremented from the original call, as > well as an nents that was not the original number of nents > called when mapped. > - Similarly in rdma_rw_ctx_signature_init, both sg and prot_sg > were unmapped with the incorrect number of nents. Those bugs should definately get fixed.. I might extract the sgtable conversion into a stand alone patch to do it. But as it is, this looks fine Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function
On 9/28/21 12:10 PM, Borislav Petkov wrote: From: Borislav Petkov Hi all, here's v4 of the cc_platform_has() patchset with feedback incorporated. I'm going to route this through tip if there are no objections. Intel CC support patch is not included in this series. You want me to address the issue raised by Joerg before merging it? Thx. Tom Lendacky (8): x86/ioremap: Selectively build arch override encryption functions arch/cc: Introduce a function to check for confidential computing features x86/sev: Add an x86 version of cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() treewide: Replace the use of mem_encrypt_active() with cc_platform_has() arch/Kconfig | 3 + arch/powerpc/include/asm/mem_encrypt.h | 5 -- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 + arch/powerpc/platforms/pseries/cc_platform.c | 26 ++ arch/powerpc/platforms/pseries/svm.c | 5 +- arch/s390/include/asm/mem_encrypt.h | 2 - arch/x86/Kconfig | 1 + arch/x86/include/asm/io.h| 8 ++ arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 12 +-- arch/x86/kernel/Makefile | 6 ++ arch/x86/kernel/cc_platform.c| 69 +++ arch/x86/kernel/crash_dump_64.c | 4 +- arch/x86/kernel/head64.c | 9 +- arch/x86/kernel/kvm.c| 3 +- arch/x86/kernel/kvmclock.c | 4 +- arch/x86/kernel/machine_kexec_64.c | 19 +++-- arch/x86/kernel/pci-swiotlb.c| 9 +- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/sev.c| 6 +- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/mm/ioremap.c| 18 ++-- arch/x86/mm/mem_encrypt.c| 55 arch/x86/mm/mem_encrypt_identity.c | 9 +- arch/x86/mm/pat/set_memory.c | 3 +- arch/x86/platform/efi/efi_64.c | 9 +- arch/x86/realmode/init.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- drivers/gpu/drm/drm_cache.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +- drivers/iommu/amd/init.c | 7 +- drivers/iommu/amd/iommu.c| 3 +- drivers/iommu/amd/iommu_v2.c | 3 +- drivers/iommu/iommu.c| 3 +- fs/proc/vmcore.c | 6 +- include/linux/cc_platform.h | 88 include/linux/mem_encrypt.h | 4 - kernel/dma/swiotlb.c | 4 +- 40 files changed, 310 insertions(+), 129 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c create mode 100644 arch/x86/kernel/cc_platform.c create mode 100644 include/linux/cc_platform.h -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On 9/28/21 11:50 AM, Luck, Tony wrote: > On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote: ... >> 1. Hide whether we need to write to real registers >> 2. Hide whether we need to update the in-memory image >> 3. Hide other FPU infrastructure like the TIF flag. >> 4. Make the users deal with a *whole* state in the replace API > > Is that difference just whether you need to save the > state from registers to memory (for the "update" case) > or not (for the "replace" case ... where you can ignore > the current register, overwrite the whole per-feature > xsave area and mark it to be restored to registers). > > If so, just a "bool full" argument might do the trick? I want to be able to hide the complexity of where the old state comes from. It might be in registers or it might be in memory or it might be *neither*. It's possible we're running with stale register state and a current->...->xsave buffer that has XFEATURES_FOO 0. In that case, the "old" copy might be memcpy'd out of the init_task. Or, for pkeys, we might build it ourselves with init_pkru_val. > Also - you have a "tsk" argument in your pseudo code. Is > this needed? Are there places where we need to perform > these operations on something other than "current"? Two cases come to mind: 1. Fork/clone where we are doing things to our child's XSAVE buffer 2. ptrace() where we are poking into another task's state ptrace() goes for the *whole* buffer now. I'm not sure it would need this per-feature API. I just call it out as something that we might need in the future. > pseudo-code: > > void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full) > { > void *addr; > > BUG_ON(!(xsave->header.xcomp_bv & xfeature)); > > addr = __raw_xsave_addr(xsave, xfeature); > > fpregs_lock(); > > if (full) > return addr; If the feature is marked as in the init state in the buffer (XSTATE_BV[feature]==0), this addr *could* contain total garbage. So, we'd want to make sure that the memory contents have the init state written before handing them back to the caller. That's not strictly required if the user is writing the whole thing, but it's the nice thing to do. > if (xfeature registers are "live") > xsaves(xstate, 1 << xfeature); One little note: I don't think we would necessarily need to do an XSAVES here. For PKRU, for instance, we could just do a rdpkru. > return addr; > } > > void finish_update_one_xsave_feature(enum xfeature xfeature) > { > mark feature modified I think we'd want to do this at the "begin" time. Also, do you mean we should set XSTATE_BV[feature]? > set TIF bit Since the XSAVE buffer was updated, it now contains the canonical FPU state. It may have diverged from the register state, thus we need to set TIF_NEED_FPU_LOAD. It's also worth noting that we *could*: xrstors(xstate, 1< fpregs_unlock(); > } > > -Tony > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 11/20] RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
On Thu, Sep 16, 2021 at 05:40:51PM -0600, Logan Gunthorpe wrote: > Introduce the helper function ib_dma_pci_p2p_dma_supported() to check > if a given ib_device can be used in P2PDMA transfers. This ensures > the ib_device is not using virt_dma and also that the underlying > dma_device supports P2PDMA. > > Use the new helper in nvme-rdma to replace the existing check for > ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows > switching away from pci_p2pdma_[un]map_sg(). > > Signed-off-by: Logan Gunthorpe > --- > drivers/nvme/target/rdma.c | 2 +- > include/rdma/ib_verbs.h| 11 +++ > 2 files changed, 12 insertions(+), 1 deletion(-) Reviewed-by: Jason Gunthorpe > +/* > + * Check if a IB device's underlying DMA mapping supports P2PDMA transfers. > + */ > +static inline bool ib_dma_pci_p2p_dma_supported(struct ib_device *dev) > +{ > + if (ib_uses_virt_dma(dev)) > + return false; If someone wants to make rxe/hfi/qib use this stuff then they will have to teach the the driver to do all the p2p checks and add some struct ib_device flag Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 08/20] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
On Thu, Sep 16, 2021 at 05:40:48PM -0600, Logan Gunthorpe wrote: > When a PCI P2PDMA page is seen, set the IOVA length of the segment > to zero so that it is not mapped into the IOVA. Then, in finalise_sg(), > apply the appropriate bus address to the segment. The IOVA is not > created if the scatterlist only consists of P2PDMA pages. > > A P2PDMA page may have three possible outcomes when being mapped: > 1) If the data path between the two devices doesn't go through > the root port, then it should be mapped with a PCI bus address > 2) If the data path goes through the host bridge, it should be mapped > normally with an IOMMU IOVA. > 3) It is not possible for the two devices to communicate and thus > the mapping operation should fail (and it will return -EREMOTEIO). > > Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to > indicate bus address segments. On unmap, P2PDMA segments are skipped > over when determining the start and end IOVA addresses. > > With this change, the flags variable in the dma_map_ops is set to > DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages. > > Signed-off-by: Logan Gunthorpe > --- > drivers/iommu/dma-iommu.c | 68 +++ > 1 file changed, 61 insertions(+), 7 deletions(-) Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 07/20] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
On Thu, Sep 16, 2021 at 05:40:47PM -0600, Logan Gunthorpe wrote: > Add a flags member to the dma_map_ops structure with one flag to > indicate support for PCI P2PDMA. > > Also, add a helper to check if a device supports PCI P2PDMA. > > Signed-off-by: Logan Gunthorpe > --- > include/linux/dma-map-ops.h | 10 ++ > include/linux/dma-mapping.h | 5 + > kernel/dma/mapping.c| 18 ++ > 3 files changed, 33 insertions(+) Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 8/8] treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
From: Tom Lendacky Replace uses of mem_encrypt_active() with calls to cc_platform_has() with the CC_ATTR_MEM_ENCRYPT attribute. Remove the implementation of mem_encrypt_active() across all arches. For s390, since the default implementation of the cc_platform_has() matches the s390 implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/powerpc/include/asm/mem_encrypt.h | 5 - arch/powerpc/platforms/pseries/svm.c| 5 +++-- arch/s390/include/asm/mem_encrypt.h | 2 -- arch/x86/include/asm/mem_encrypt.h | 5 - arch/x86/kernel/head64.c| 9 +++-- arch/x86/mm/ioremap.c | 4 ++-- arch/x86/mm/mem_encrypt.c | 2 +- arch/x86/mm/pat/set_memory.c| 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++- drivers/gpu/drm/drm_cache.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++--- drivers/iommu/amd/iommu.c | 3 ++- drivers/iommu/amd/iommu_v2.c| 3 ++- drivers/iommu/iommu.c | 3 ++- fs/proc/vmcore.c| 6 +++--- include/linux/mem_encrypt.h | 4 kernel/dma/swiotlb.c| 4 ++-- 18 files changed, 36 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/mem_encrypt.h b/arch/powerpc/include/asm/mem_encrypt.h index ba9dab07c1be..2f26b8fc8d29 100644 --- a/arch/powerpc/include/asm/mem_encrypt.h +++ b/arch/powerpc/include/asm/mem_encrypt.h @@ -10,11 +10,6 @@ #include -static inline bool mem_encrypt_active(void) -{ - return is_secure_guest(); -} - static inline bool force_dma_unencrypted(struct device *dev) { return is_secure_guest(); diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c index 87f001b4c4e4..c083ecbbae4d 100644 --- a/arch/powerpc/platforms/pseries/svm.c +++ b/arch/powerpc/platforms/pseries/svm.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -63,7 +64,7 @@ void __init svm_swiotlb_init(void) int set_memory_encrypted(unsigned long addr, int numpages) { - if (!mem_encrypt_active()) + if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT)) return 0; if (!PAGE_ALIGNED(addr)) @@ -76,7 +77,7 @@ int set_memory_encrypted(unsigned long addr, int numpages) int set_memory_decrypted(unsigned long addr, int numpages) { - if (!mem_encrypt_active()) + if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT)) return 0; if (!PAGE_ALIGNED(addr)) diff --git a/arch/s390/include/asm/mem_encrypt.h b/arch/s390/include/asm/mem_encrypt.h index 2542cbf7e2d1..08a8b96606d7 100644 --- a/arch/s390/include/asm/mem_encrypt.h +++ b/arch/s390/include/asm/mem_encrypt.h @@ -4,8 +4,6 @@ #ifndef __ASSEMBLY__ -static inline bool mem_encrypt_active(void) { return false; } - int set_memory_encrypted(unsigned long addr, int numpages); int set_memory_decrypted(unsigned long addr, int numpages); diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index da14ede311aa..2d4f5c17d79c 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -96,11 +96,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { } extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[]; -static inline bool mem_encrypt_active(void) -{ - return sme_me_mask; -} - static inline u64 sme_get_me_mask(void) { return sme_me_mask; diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index de01903c3735..fc5371a7e9d1 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include @@ -284,8 +284,13 @@ unsigned long __head __startup_64(unsigned long physaddr, * The bss section will be memset to zero later in the initialization so * there is no need to zero it after changing the memory encryption * attribute. +* +* This is early code, use an open coded check for SME instead of +* using cc_platform_has(). This eliminates worries about removing +* instrumentation or checking boot_cpu_data in the cc_platform_has() +* function. */ - if (mem_encrypt_active()) { + if (sme_get_me_mask()) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index b59a5cbc6bc5..026031b3b782 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -694,7 +694,7 @@ static bool __init
[PATCH 7/8] x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
From: Tom Lendacky Replace uses of sev_es_active() with the more generic cc_platform_has() using CC_ATTR_GUEST_STATE_ENCRYPT. If future support is added for other memory encyrption techonologies, the use of CC_ATTR_GUEST_STATE_ENCRYPT can be updated, as required. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/sev.c | 6 +++--- arch/x86/mm/mem_encrypt.c | 24 +++- arch/x86/realmode/init.c | 3 +-- 4 files changed, 7 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index a5a58ccd1ee3..da14ede311aa 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void); void __init mem_encrypt_init(void); void __init sev_es_init_vc_handling(void); -bool sev_es_active(void); #define __bss_decrypted __section(".bss..decrypted") @@ -74,7 +73,6 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { } static inline void __init sme_enable(struct boot_params *bp) { } static inline void sev_es_init_vc_handling(void) { } -static inline bool sev_es_active(void) { return false; } static inline int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; } diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index a6895e440bc3..53a6837d354b 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -11,7 +11,7 @@ #include /* For show_regs() */ #include -#include +#include #include #include #include @@ -615,7 +615,7 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd) int cpu; u64 pfn; - if (!sev_es_active()) + if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) return 0; pflags = _PAGE_NX | _PAGE_RW; @@ -774,7 +774,7 @@ void __init sev_es_init_vc_handling(void) BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % PAGE_SIZE); - if (!sev_es_active()) + if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) return; if (!sev_es_check_cpu_features()) diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 932007a6913b..2d04c39bea1d 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -361,25 +361,6 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size) return early_set_memory_enc_dec(vaddr, size, true); } -/* - * SME and SEV are very similar but they are not the same, so there are - * times that the kernel will need to distinguish between SME and SEV. The - * cc_platform_has() function is used for this. When a distinction isn't - * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used. - * - * The trampoline code is a good example for this requirement. Before - * paging is activated, SME will access all memory as decrypted, but SEV - * will access all memory as encrypted. So, when APs are being brought - * up under SME the trampoline area cannot be encrypted, whereas under SEV - * the trampoline area must be encrypted. - */ - -/* Needs to be called from non-instrumentable code */ -bool noinstr sev_es_active(void) -{ - return sev_status & MSR_AMD64_SEV_ES_ENABLED; -} - /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) { @@ -449,7 +430,7 @@ static void print_mem_encrypt_feature_info(void) pr_cont(" SEV"); /* Encrypted Register State */ - if (sev_es_active()) + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) pr_cont(" SEV-ES"); pr_cont("\n"); @@ -468,7 +449,8 @@ void __init mem_encrypt_init(void) * With SEV, we need to unroll the rep string I/O instructions, * but SEV-ES supports them through the #VC handler. */ - if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && !sev_es_active()) + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && + !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) static_branch_enable(_enable_key); print_mem_encrypt_feature_info(); diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index c878c5ee5a4c..4a3da7592b99 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -2,7 +2,6 @@ #include #include #include -#include #include #include @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct trampoline_header *th) if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) th->flags |= TH_FLAGS_SME_ACTIVE; - if (sev_es_active()) { + if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { /* * Skip the call to verify_cpu() in secondary_startup_64 as it * will cause #VC exceptions when the AP can't handle them yet.
[PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions
From: Tom Lendacky In preparation for other uses of the cc_platform_has() function besides AMD's memory encryption support, selectively build the AMD memory encryption architecture override functions only when CONFIG_AMD_MEM_ENCRYPT=y. These functions are: - early_memremap_pgprot_adjust() - arch_memremap_can_ram_remap() Additionally, routines that are only invoked by these architecture override functions can also be conditionally built. These functions are: - memremap_should_map_decrypted() - memremap_is_efi_data() - memremap_is_setup_data() - early_memremap_is_setup_data() And finally, phys_mem_access_encrypted() is conditionally built as well, but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is not set. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/io.h | 8 arch/x86/mm/ioremap.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 841a5d104afa..5c6a4af0b911 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -391,6 +391,7 @@ extern void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size) #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc #endif +#ifdef CONFIG_AMD_MEM_ENCRYPT extern bool arch_memremap_can_ram_remap(resource_size_t offset, unsigned long size, unsigned long flags); @@ -398,6 +399,13 @@ extern bool arch_memremap_can_ram_remap(resource_size_t offset, extern bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size); +#else +static inline bool phys_mem_access_encrypted(unsigned long phys_addr, +unsigned long size) +{ + return true; +} +#endif /** * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 60ade7dd71bd..ccff76cedd8f 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -508,6 +508,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr) memunmap((void *)((unsigned long)addr & PAGE_MASK)); } +#ifdef CONFIG_AMD_MEM_ENCRYPT /* * Examine the physical address to determine if it is an area of memory * that should be mapped decrypted. If the memory is not part of the @@ -746,7 +747,6 @@ bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size) return arch_memremap_can_ram_remap(phys_addr, size, 0); } -#ifdef CONFIG_AMD_MEM_ENCRYPT /* Remap memory with encryption */ void __init *early_memremap_encrypted(resource_size_t phys_addr, unsigned long size) -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
From: Tom Lendacky Replace uses of sme_active() with the more generic cc_platform_has() using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT can be updated, as required. This also replaces two usages of sev_active() that are really geared towards detecting if SME is active. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/machine_kexec_64.c | 15 --- arch/x86/kernel/pci-swiotlb.c| 9 - arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/mm/ioremap.c| 6 +++--- arch/x86/mm/mem_encrypt.c| 13 - arch/x86/mm/mem_encrypt_identity.c | 9 - arch/x86/realmode/init.c | 5 +++-- drivers/iommu/amd/init.c | 7 --- 10 files changed, 36 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 0a6e34b07017..11b7c06e2828 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -129,7 +129,7 @@ relocate_kernel(unsigned long indirection_page, unsigned long page_list, unsigned long start_address, unsigned int preserve_context, - unsigned int sme_active); + unsigned int host_mem_enc_active); #endif #define ARCH_HAS_KIMAGE_ARCH diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index 3fb9f5ebefa4..63c5b99ccae5 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void); void __init mem_encrypt_init(void); void __init sev_es_init_vc_handling(void); -bool sme_active(void); bool sev_active(void); bool sev_es_active(void); @@ -76,7 +75,6 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { } static inline void __init sme_enable(struct boot_params *bp) { } static inline void sev_es_init_vc_handling(void) { } -static inline bool sme_active(void) { return false; } static inline bool sev_active(void) { return false; } static inline bool sev_es_active(void) { return false; } diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 131f30fdcfbd..7040c0fa921c 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -358,7 +359,7 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, image->preserve_context, - sme_active()); + cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); #ifdef CONFIG_KEXEC_JUMP if (image->preserve_context) @@ -569,12 +570,12 @@ void arch_kexec_unprotect_crashkres(void) */ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) { - if (sev_active()) + if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return 0; /* -* If SME is active we need to be sure that kexec pages are -* not encrypted because when we boot to the new kernel the +* If host memory encryption is active we need to be sure that kexec +* pages are not encrypted because when we boot to the new kernel the * pages won't be accessed encrypted (initially). */ return set_memory_decrypted((unsigned long)vaddr, pages); @@ -582,12 +583,12 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp) void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { - if (sev_active()) + if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return; /* -* If SME is active we need to reset the pages back to being -* an encrypted mapping before freeing them. +* If host memory encryption is active we need to reset the pages back +* to being an encrypted mapping before freeing them. */ set_memory_encrypted((unsigned long)vaddr, pages); } diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c index c2cfa5e7c152..814ab46a0dad 100644 --- a/arch/x86/kernel/pci-swiotlb.c +++ b/arch/x86/kernel/pci-swiotlb.c @@ -6,7 +6,7 @@ #include #include #include -#include +#include #include #include @@ -45,11 +45,10 @@ int __init pci_swiotlb_detect_4gb(void) swiotlb = 1; /* -* If SME is active then swiotlb will be set to 1 so that bounce -* buffers are allocated and used for devices that do not support -* the addressing range required for the encryption mask. +*
[PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
From: Tom Lendacky Introduce a powerpc version of the cc_platform_has() function. This will be used to replace the powerpc mem_encrypt_active() implementation, so the implementation will initially only support the CC_ATTR_MEM_ENCRYPT attribute. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov Acked-by: Michael Ellerman --- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 ++ arch/powerpc/platforms/pseries/cc_platform.c | 26 3 files changed, 29 insertions(+) create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 5e037df2a3a1..2e57391e0778 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -159,6 +159,7 @@ config PPC_SVM select SWIOTLB select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_FORCE_DMA_UNENCRYPTED + select ARCH_HAS_CC_PLATFORM help There are certain POWER platforms which support secure guests using the Protected Execution Facility, with the help of an Ultravisor diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index 4cda0ef87be0..41d8aee98da4 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -31,3 +31,5 @@ obj-$(CONFIG_FA_DUMP) += rtas-fadump.o obj-$(CONFIG_SUSPEND) += suspend.o obj-$(CONFIG_PPC_VAS) += vas.o + +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o diff --git a/arch/powerpc/platforms/pseries/cc_platform.c b/arch/powerpc/platforms/pseries/cc_platform.c new file mode 100644 index ..e8021af83a19 --- /dev/null +++ b/arch/powerpc/platforms/pseries/cc_platform.c @@ -0,0 +1,26 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Confidential Computing Platform Capability checks + * + * Copyright (C) 2021 Advanced Micro Devices, Inc. + * + * Author: Tom Lendacky + */ + +#include +#include + +#include +#include + +bool cc_platform_has(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_MEM_ENCRYPT: + return is_secure_guest(); + + default: + return false; + } +} +EXPORT_SYMBOL_GPL(cc_platform_has); -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/8] x86/sev: Replace occurrences of sev_active() with cc_platform_has()
From: Tom Lendacky Replace uses of sev_active() with the more generic cc_platform_has() using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT can be updated, as required. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/crash_dump_64.c| 4 +++- arch/x86/kernel/kvm.c | 3 ++- arch/x86/kernel/kvmclock.c | 4 ++-- arch/x86/kernel/machine_kexec_64.c | 4 ++-- arch/x86/kvm/svm/svm.c | 3 ++- arch/x86/mm/ioremap.c | 6 +++--- arch/x86/mm/mem_encrypt.c | 21 - arch/x86/platform/efi/efi_64.c | 9 + 9 files changed, 27 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index 63c5b99ccae5..a5a58ccd1ee3 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void); void __init mem_encrypt_init(void); void __init sev_es_init_vc_handling(void); -bool sev_active(void); bool sev_es_active(void); #define __bss_decrypted __section(".bss..decrypted") @@ -75,7 +74,6 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { } static inline void __init sme_enable(struct boot_params *bp) { } static inline void sev_es_init_vc_handling(void) { } -static inline bool sev_active(void) { return false; } static inline bool sev_es_active(void) { return false; } static inline int __init diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c index 045e82e8945b..a7f617a3981d 100644 --- a/arch/x86/kernel/crash_dump_64.c +++ b/arch/x86/kernel/crash_dump_64.c @@ -10,6 +10,7 @@ #include #include #include +#include static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, unsigned long offset, int userbuf, @@ -73,5 +74,6 @@ ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize, ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos) { - return read_from_oldmem(buf, count, ppos, 0, sev_active()); + return read_from_oldmem(buf, count, ppos, 0, + cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b656456c3a94..8863d1941f1b 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -418,7 +419,7 @@ static void __init sev_map_percpu_data(void) { int cpu; - if (!sev_active()) + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) return; for_each_possible_cpu(cpu) { diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ad273e5861c1..fc3930c5db1b 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -16,9 +16,9 @@ #include #include #include +#include #include -#include #include #include @@ -232,7 +232,7 @@ static void __init kvmclock_init_mem(void) * hvclock is shared between the guest and the hypervisor, must * be mapped decrypted. */ - if (sev_active()) { + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { r = set_memory_decrypted((unsigned long) hvclock_mem, 1UL << order); if (r) { diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 7040c0fa921c..f5da4a18070a 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -167,7 +167,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) } pte = pte_offset_kernel(pmd, vaddr); - if (sev_active()) + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) prot = PAGE_KERNEL_EXEC; set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); @@ -207,7 +207,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) level4p = (pgd_t *)__va(start_pgtable); clear_page(level4p); - if (sev_active()) { + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { info.page_flag |= _PAGE_ENC; info.kernpg_flag |= _PAGE_ENC; } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 05e8d4d27969..a2f78a8cfdaa 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -455,7 +456,7 @@ static int has_svm(void) return 0; } - if (sev_active()) { + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { pr_info("KVM is unsupported when running as an SEV guest\n");
[PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()
From: Tom Lendacky Introduce an x86 version of the cc_platform_has() function. This will be used to replace vendor specific calls like sme_active(), sev_active(), etc. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/Kconfig | 1 + arch/x86/include/asm/mem_encrypt.h | 1 + arch/x86/kernel/Makefile | 6 +++ arch/x86/kernel/cc_platform.c | 69 ++ arch/x86/mm/mem_encrypt.c | 1 + 5 files changed, 78 insertions(+) create mode 100644 arch/x86/kernel/cc_platform.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ab83c22d274e..9f190ec4f953 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT select ARCH_HAS_FORCE_DMA_UNENCRYPTED select INSTRUCTION_DECODER select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS + select ARCH_HAS_CC_PLATFORM help Say yes to enable support for the encryption of system memory. This requires an AMD processor that supports Secure Memory diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index 9c80c68d75b5..3fb9f5ebefa4 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -13,6 +13,7 @@ #ifndef __ASSEMBLY__ #include +#include #include diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8f4e8fa6ed75..2ff3e600f426 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg CFLAGS_REMOVE_head64.o = -pg CFLAGS_REMOVE_sev.o = -pg +CFLAGS_REMOVE_cc_platform.o = -pg endif KASAN_SANITIZE_head$(BITS).o := n @@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o:= n KASAN_SANITIZE_stacktrace.o:= n KASAN_SANITIZE_paravirt.o := n KASAN_SANITIZE_sev.o := n +KASAN_SANITIZE_cc_platform.o := n # With some compiler versions the generated code results in boot hangs, caused # by several compilation units. To be safe, disable all instrumentation. @@ -47,6 +49,7 @@ endif KCOV_INSTRUMENT:= n CFLAGS_head$(BITS).o += -fno-stack-protector +CFLAGS_cc_platform.o += -fno-stack-protector CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace @@ -147,6 +150,9 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)+= unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o + +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c new file mode 100644 index ..03bb2f343ddb --- /dev/null +++ b/arch/x86/kernel/cc_platform.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Confidential Computing Platform Capability checks + * + * Copyright (C) 2021 Advanced Micro Devices, Inc. + * + * Author: Tom Lendacky + */ + +#include +#include +#include + +#include + +static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_INTEL_TDX_GUEST + return false; +#else + return false; +#endif +} + +/* + * SME and SEV are very similar but they are not the same, so there are + * times that the kernel will need to distinguish between SME and SEV. The + * cc_platform_has() function is used for this. When a distinction isn't + * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used. + * + * The trampoline code is a good example for this requirement. Before + * paging is activated, SME will access all memory as decrypted, but SEV + * will access all memory as encrypted. So, when APs are being brought + * up under SME the trampoline area cannot be encrypted, whereas under SEV + * the trampoline area must be encrypted. + */ +static bool amd_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_AMD_MEM_ENCRYPT + switch (attr) { + case CC_ATTR_MEM_ENCRYPT: + return sme_me_mask; + + case CC_ATTR_HOST_MEM_ENCRYPT: + return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED); + + case CC_ATTR_GUEST_MEM_ENCRYPT: + return sev_status & MSR_AMD64_SEV_ENABLED; + + case CC_ATTR_GUEST_STATE_ENCRYPT: + return sev_status & MSR_AMD64_SEV_ES_ENABLED; + + default: + return false; + } +#else + return false; +#endif +} + + +bool cc_platform_has(enum cc_attr attr) +{ + if (sme_me_mask) + return amd_cc_platform_has(attr); + + return false; +} +EXPORT_SYMBOL_GPL(cc_platform_has); diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index ff08dc463634..e29b1418d00c 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -20,6 +20,7
[PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features
From: Tom Lendacky In preparation for other confidential computing technologies, introduce a generic helper function, cc_platform_has(), that can be used to check for specific active confidential computing attributes, like memory encryption. This is intended to eliminate having to add multiple technology-specific checks to the code (e.g. if (sev_active() || tdx_active() || ... ). [ bp: s/_CC_PLATFORM_H/_LINUX_CC_PLATFORM_H/g ] Co-developed-by: Andi Kleen Signed-off-by: Andi Kleen Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/Kconfig| 3 ++ include/linux/cc_platform.h | 88 + 2 files changed, 91 insertions(+) create mode 100644 include/linux/cc_platform.h diff --git a/arch/Kconfig b/arch/Kconfig index 8df1c7102643..d1e69d6e8498 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1234,6 +1234,9 @@ config RELR config ARCH_HAS_MEM_ENCRYPT bool +config ARCH_HAS_CC_PLATFORM + bool + config HAVE_SPARSE_SYSCALL_NR bool help diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h new file mode 100644 index ..a075b70b9a70 --- /dev/null +++ b/include/linux/cc_platform.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Confidential Computing Platform Capability checks + * + * Copyright (C) 2021 Advanced Micro Devices, Inc. + * + * Author: Tom Lendacky + */ + +#ifndef _LINUX_CC_PLATFORM_H +#define _LINUX_CC_PLATFORM_H + +#include +#include + +/** + * enum cc_attr - Confidential computing attributes + * + * These attributes represent confidential computing features that are + * currently active. + */ +enum cc_attr { + /** +* @CC_ATTR_MEM_ENCRYPT: Memory encryption is active +* +* The platform/OS is running with active memory encryption. This +* includes running either as a bare-metal system or a hypervisor +* and actively using memory encryption or as a guest/virtual machine +* and actively using memory encryption. +* +* Examples include SME, SEV and SEV-ES. +*/ + CC_ATTR_MEM_ENCRYPT, + + /** +* @CC_ATTR_HOST_MEM_ENCRYPT: Host memory encryption is active +* +* The platform/OS is running as a bare-metal system or a hypervisor +* and actively using memory encryption. +* +* Examples include SME. +*/ + CC_ATTR_HOST_MEM_ENCRYPT, + + /** +* @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active +* +* The platform/OS is running as a guest/virtual machine and actively +* using memory encryption. +* +* Examples include SEV and SEV-ES. +*/ + CC_ATTR_GUEST_MEM_ENCRYPT, + + /** +* @CC_ATTR_GUEST_STATE_ENCRYPT: Guest state encryption is active +* +* The platform/OS is running as a guest/virtual machine and actively +* using memory encryption and register state encryption. +* +* Examples include SEV-ES. +*/ + CC_ATTR_GUEST_STATE_ENCRYPT, +}; + +#ifdef CONFIG_ARCH_HAS_CC_PLATFORM + +/** + * cc_platform_has() - Checks if the specified cc_attr attribute is active + * @attr: Confidential computing attribute to check + * + * The cc_platform_has() function will return an indicator as to whether the + * specified Confidential Computing attribute is currently active. + * + * Context: Any context + * Return: + * * TRUE - Specified Confidential Computing attribute is active + * * FALSE - Specified Confidential Computing attribute is not active + */ +bool cc_platform_has(enum cc_attr attr); + +#else /* !CONFIG_ARCH_HAS_CC_PLATFORM */ + +static inline bool cc_platform_has(enum cc_attr attr) { return false; } + +#endif /* CONFIG_ARCH_HAS_CC_PLATFORM */ + +#endif /* _LINUX_CC_PLATFORM_H */ -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/8] Implement generic cc_platform_has() helper function
From: Borislav Petkov Hi all, here's v4 of the cc_platform_has() patchset with feedback incorporated. I'm going to route this through tip if there are no objections. Thx. Tom Lendacky (8): x86/ioremap: Selectively build arch override encryption functions arch/cc: Introduce a function to check for confidential computing features x86/sev: Add an x86 version of cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() treewide: Replace the use of mem_encrypt_active() with cc_platform_has() arch/Kconfig | 3 + arch/powerpc/include/asm/mem_encrypt.h | 5 -- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 + arch/powerpc/platforms/pseries/cc_platform.c | 26 ++ arch/powerpc/platforms/pseries/svm.c | 5 +- arch/s390/include/asm/mem_encrypt.h | 2 - arch/x86/Kconfig | 1 + arch/x86/include/asm/io.h| 8 ++ arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 12 +-- arch/x86/kernel/Makefile | 6 ++ arch/x86/kernel/cc_platform.c| 69 +++ arch/x86/kernel/crash_dump_64.c | 4 +- arch/x86/kernel/head64.c | 9 +- arch/x86/kernel/kvm.c| 3 +- arch/x86/kernel/kvmclock.c | 4 +- arch/x86/kernel/machine_kexec_64.c | 19 +++-- arch/x86/kernel/pci-swiotlb.c| 9 +- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kernel/sev.c| 6 +- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/mm/ioremap.c| 18 ++-- arch/x86/mm/mem_encrypt.c| 55 arch/x86/mm/mem_encrypt_identity.c | 9 +- arch/x86/mm/pat/set_memory.c | 3 +- arch/x86/platform/efi/efi_64.c | 9 +- arch/x86/realmode/init.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- drivers/gpu/drm/drm_cache.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +- drivers/iommu/amd/init.c | 7 +- drivers/iommu/amd/iommu.c| 3 +- drivers/iommu/amd/iommu_v2.c | 3 +- drivers/iommu/iommu.c| 3 +- fs/proc/vmcore.c | 6 +- include/linux/cc_platform.h | 88 include/linux/mem_encrypt.h | 4 - kernel/dma/swiotlb.c | 4 +- 40 files changed, 310 insertions(+), 129 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c create mode 100644 arch/x86/kernel/cc_platform.c create mode 100644 include/linux/cc_platform.h -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 06/20] dma-direct: support PCI P2PDMA pages in dma-direct map_sg
On Thu, Sep 16, 2021 at 05:40:46PM -0600, Logan Gunthorpe wrote: > Add PCI P2PDMA support for dma_direct_map_sg() so that it can map > PCI P2PDMA pages directly without a hack in the callers. This allows > for heterogeneous SGLs that contain both P2PDMA and regular pages. > > A P2PDMA page may have three possible outcomes when being mapped: > 1) If the data path between the two devices doesn't go through the > root port, then it should be mapped with a PCI bus address > 2) If the data path goes through the host bridge, it should be mapped > normally, as though it were a CPU physical address > 3) It is not possible for the two devices to communicate and thus > the mapping operation should fail (and it will return -EREMOTEIO). > > SGL segments that contain PCI bus addresses are marked with > sg_dma_mark_pci_p2pdma() and are ignored when unmapped. > > Signed-off-by: Logan Gunthorpe > kernel/dma/direct.c | 44 ++-- > 1 file changed, 38 insertions(+), 6 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 4c6c5e0635e3..fa8317e8ff44 100644 > +++ b/kernel/dma/direct.c > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include "direct.h" > > /* > @@ -421,29 +422,60 @@ void dma_direct_sync_sg_for_cpu(struct device *dev, > arch_sync_dma_for_cpu_all(); > } > > +/* > + * Unmaps segments, except for ones marked as pci_p2pdma which do not > + * require any further action as they contain a bus address. > + */ > void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > int nents, enum dma_data_direction dir, unsigned long attrs) > { > struct scatterlist *sg; > int i; > > - for_each_sg(sgl, sg, nents, i) > - dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, > - attrs); > + for_each_sg(sgl, sg, nents, i) { > + if (sg_is_dma_pci_p2pdma(sg)) { > + sg_dma_unmark_pci_p2pdma(sg); > + } else { > + dma_direct_unmap_page(dev, sg->dma_address, > + sg_dma_len(sg), dir, attrs); > + } If the main usage of this SGL bit is to indicate if it has been DMA mapped, or not, I think it should be renamed to something clearer. p2pdma is being used for lots of things now, it feels very counter-intuitive that P2PDMA pages are not flagged with something called sg_is_dma_pci_p2pdma(). How about sg_is_dma_unmapped_address() ? > > for_each_sg(sgl, sg, nents, i) { > + if (is_pci_p2pdma_page(sg_page(sg))) { > + map = pci_p2pdma_map_segment(_state, dev, sg); > + switch (map) { > + case PCI_P2PDMA_MAP_BUS_ADDR: > + continue; > + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: > + /* > + * Mapping through host bridge should be > + * mapped normally, thus we do nothing > + * and continue below. > + */ > + break; > + default: > + ret = -EREMOTEIO; > + goto out_unmap; > + } > + } > + > sg->dma_address = dma_direct_map_page(dev, sg_page(sg), > sg->offset, sg->length, dir, attrs); dma_direct_map_page() can trigger swiotlb and I didn't see this series dealing with that? It would probably be fine for now to fail swiotlb_map() for p2p pages? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 05/20] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
On Thu, Sep 16, 2021 at 05:40:45PM -0600, Logan Gunthorpe wrote: > Add EREMOTEIO error return to dma_map_sgtable() which will be used > by .map_sg() implementations that detect P2PDMA pages that the > underlying DMA device cannot access. > > Signed-off-by: Logan Gunthorpe > --- > kernel/dma/mapping.c | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote: > Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg() > implementations. It takes an scatterlist segment that must point to a > pci_p2pdma struct page and will map it if the mapping requires a bus > address. > > The return value indicates whether the mapping required a bus address > or whether the caller still needs to map the segment normally. If the > segment should not be mapped, -EREMOTEIO is returned. > > This helper uses a state structure to track the changes to the > pgmap across calls and avoid needing to lookup into the xarray for > every page. > > Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU > dma_map_sg() implementations where the sg segment containing the page > differs from the sg segment containing the DMA address. > > Signed-off-by: Logan Gunthorpe > drivers/pci/p2pdma.c | 59 ++ > include/linux/pci-p2pdma.h | 21 ++ > 2 files changed, 80 insertions(+) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index b656d8c801a7..58c34f1f1473 100644 > +++ b/drivers/pci/p2pdma.c > @@ -943,6 +943,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, > struct scatterlist *sg, > } > EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs); > > +/** > + * pci_p2pdma_map_segment - map an sg segment determining the mapping type > + * @state: State structure that should be declared outside of the > for_each_sg() > + * loop and initialized to zero. > + * @dev: DMA device that's doing the mapping operation > + * @sg: scatterlist segment to map > + * > + * This is a helper to be used by non-iommu dma_map_sg() implementations > where > + * the sg segment is the same for the page_link and the dma_address. > + * > + * Attempt to map a single segment in an SGL with the PCI bus address. > + * The segment must point to a PCI P2PDMA page and thus must be > + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check. > + * > + * Returns the type of mapping used and maps the page if the type is > + * PCI_P2PDMA_MAP_BUS_ADDR. > + */ > +enum pci_p2pdma_map_type > +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > +struct scatterlist *sg) > +{ > + if (state->pgmap != sg_page(sg)->pgmap) { > + state->pgmap = sg_page(sg)->pgmap; > + state->map = pci_p2pdma_map_type(state->pgmap, dev); > + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; > + } Is this safe? I was just talking with Joao about this, https://lore.kernel.org/r/20210928180150.gi3544...@ziepe.ca API wise I absolutely think this should be safe as written, but is it really? Does pgmap ensure that a positive refcount struct page always has a valid pgmap pointer (and thus the mess in gup can be deleted) or does this need to get the pgmap as well to keep it alive? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote: > That's close to what we want. > > The size should probably be implicit. If it isn't implicit, it needs to > at least be double-checked against the state sizes. > > Not to get too fancy, but I think we also want to have a "replace" > operation which is separate from the "update". Think of a case where > you are trying to set a bit: > > struct pkru_state *pk = start_update_xstate(tsk, XSTATE_PKRU); > pk->pkru |= 0x100; > finish_update_xstate(tsk, XSTATE_PKRU, pk); > > versus setting a whole new value: > > struct pkru_state *pk = start_replace_xstate(tsk, XSTATE_PKRU); > memset(pkru, sizeof(*pk), 0); > pk->pkru = 0x1234; > finish_replace_xstate(tsk, XSTATE_PKRU, pk); > > They look similar, but they actually might have very different costs for > big states like AMX. We can also do some very different debugging for > them. In normal operation, these _can_ just return pointers directly to > the fpu...->xstate in some case. But, if we're debugging, we could > kmalloc() a buffer and do sanity checking before it goes into the task > buffer. > > We don't have to do any of that fancy stuff now. We don't even need to > do an "update" if all we need for now for XFEATURE_PASID is a "replace". > > 1. Hide whether we need to write to real registers > 2. Hide whether we need to update the in-memory image > 3. Hide other FPU infrastructure like the TIF flag. > 4. Make the users deal with a *whole* state in the replace API Is that difference just whether you need to save the state from registers to memory (for the "update" case) or not (for the "replace" case ... where you can ignore the current register, overwrite the whole per-feature xsave area and mark it to be restored to registers). If so, just a "bool full" argument might do the trick? Also - you have a "tsk" argument in your pseudo code. Is this needed? Are there places where we need to perform these operations on something other than "current"? pseudo-code: void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full) { void *addr; BUG_ON(!(xsave->header.xcomp_bv & xfeature)); addr = __raw_xsave_addr(xsave, xfeature); fpregs_lock(); if (full) return addr; if (xfeature registers are "live") xsaves(xstate, 1 << xfeature); return addr; } void finish_update_one_xsave_feature(enum xfeature xfeature) { mark feature modified set TIF bit fpregs_unlock(); } -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static
On Thu, Sep 16, 2021 at 05:40:43PM -0600, Logan Gunthorpe wrote: > +enum pci_p2pdma_map_type { > + /* > + * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping > + * type hasn't been calculated yet. Functions that return this enum > + * never return this value. > + */ > + PCI_P2PDMA_MAP_UNKNOWN = 0, > + > + /* > + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will > + * traverse the host bridge and the host bridge is not in the > + * whitelist. DMA Mapping routines should return an error when I gather we are supposed to type allowlist now > + * this is returned. > + */ > + PCI_P2PDMA_MAP_NOT_SUPPORTED, > + > + /* > + * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to > + * eachother directly through a PCI switch and the transaction will 'each other' > + * not traverse the host bridge. Such a mapping should program > + * the DMA engine with PCI bus addresses. > + */ > + PCI_P2PDMA_MAP_BUS_ADDR, > + > + /* > + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk > + * to eachother, but the transaction traverses a host bridge on the 'each other' > + * whitelist. In this case, a normal mapping either with CPU physical > + * addresses (in the case of dma-direct) or IOVA addresses (in the > + * case of IOMMUs) should be used to program the DMA engine. > + */ > + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, > +}; Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
On Thu, Sep 16, 2021 at 05:40:41PM -0600, Logan Gunthorpe wrote: > Make use of the third free LSB in scatterlist's page_link on 64bit systems. > > The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a > given SGL segments dma_address points to a PCI bus address. > dma_unmap_sg_p2pdma() will need to perform different cleanup when a > segment is marked as P2PDMA. > > Using this bit requires adding an additional dependency on CONFIG_64BIT to > CONFIG_PCI_P2PDMA. This should be acceptable as the majority of P2PDMA > use cases are restricted to newer root complexes and roughly require the > extra address space for memory BARs used in the transactions. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: John Hubbard > drivers/pci/Kconfig | 2 +- > include/linux/scatterlist.h | 50 ++--- > 2 files changed, 47 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig > index 0c473d75e625..90b4bddb3300 100644 > +++ b/drivers/pci/Kconfig > @@ -163,7 +163,7 @@ config PCI_PASID > > config PCI_P2PDMA > bool "PCI peer-to-peer transfer support" > - depends on ZONE_DEVICE > + depends on ZONE_DEVICE && 64BIT Perhaps a comment to explain what the 64bit is doing? > select GENERIC_ALLOCATOR > help > Enableѕ drivers to do PCI peer-to-peer transactions to and from > diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h > index 266754a55327..e62b1cf6386f 100644 > +++ b/include/linux/scatterlist.h > @@ -64,6 +64,21 @@ struct sg_append_table { > #define SG_CHAIN 0x01UL > #define SG_END 0x02UL > > +/* > + * bit 2 is the third free bit in the page_link on 64bit systems which > + * is used by dma_unmap_sg() to determine if the dma_address is a PCI > + * bus address when doing P2PDMA. > + * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this. > + */ > + > +#ifdef CONFIG_PCI_P2PDMA > +#define SG_DMA_PCI_P2PDMA0x04UL Add a static_assert(__alignof__(void *) == 8); ? > +#else > +#define SG_DMA_PCI_P2PDMA0x00UL > +#endif > + > +#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_PCI_P2PDMA) > + > /* > * We overload the LSB of the page pointer to indicate whether it's > * a valid sg entry, or whether it points to the start of a new scatterlist. > @@ -72,7 +87,9 @@ struct sg_append_table { > #define sg_is_chain(sg) ((sg)->page_link & SG_CHAIN) > #define sg_is_last(sg) ((sg)->page_link & SG_END) > #define sg_chain_ptr(sg) \ > - ((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END))) > + ((struct scatterlist *)((sg)->page_link & ~SG_PAGE_LINK_MASK)) > + > +#define sg_is_dma_pci_p2pdma(sg) ((sg)->page_link & SG_DMA_PCI_P2PDMA) I've been encouraging people to use static inlines more.. static inline unsigned int __sg_flags(struct scatterlist *sg) { return sg->page_link & SG_PAGE_LINK_MASK; } static inline bool sg_is_chain(struct scatterlist *sg) { return __sg_flags(sg) & SG_CHAIN; } static inline bool sg_is_last(struct scatterlist *sg) { return __sg_flags(sg) & SG_END; } static inline bool sg_is_dma_pci_p2pdma(struct scatterlist *sg) { return __sg_flags(sg) & SG_DMA_PCI_P2PDMA; } > /** > * sg_assign_page - Assign a given page to an SG entry > @@ -86,13 +103,13 @@ struct sg_append_table { > **/ > static inline void sg_assign_page(struct scatterlist *sg, struct page *page) > { > - unsigned long page_link = sg->page_link & (SG_CHAIN | SG_END); > + unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK; I think this should just be '& SG_END', sg_assign_page() doesn't look like it should ever be used on a sg_chain entry, so this is just trying to preserve the end stamp. Anyway, this looks OK Reviewed-by: Jason Gunthorpe Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 9/9] iommu/vt-d: Use pci core's DVSEC functionality
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > Reduce maintenance burden of DVSEC query implementation by using the > centralized PCI core implementation. > > Cc: iommu@lists.linux-foundation.org > Cc: David Woodhouse > Cc: Lu Baolu > Signed-off-by: Ben Widawsky > --- > drivers/iommu/intel/iommu.c | 15 +-- > 1 file changed, 1 insertion(+), 14 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index d75f59ae28e6..30c97181f0ae 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -5398,20 +5398,7 @@ static int intel_iommu_disable_sva(struct device *dev) > */ > static int siov_find_pci_dvsec(struct pci_dev *pdev) > { > - int pos; > - u16 vendor, id; > - > - pos = pci_find_next_ext_capability(pdev, 0, 0x23); > - while (pos) { > - pci_read_config_word(pdev, pos + 4, ); > - pci_read_config_word(pdev, pos + 8, ); > - if (vendor == PCI_VENDOR_ID_INTEL && id == 5) > - return pos; > - > - pos = pci_find_next_ext_capability(pdev, pos, 0x23); > - } > - > - return 0; > + return pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_INTEL, 5); > } Same comments as the CXL patch, siov_find_pci_dvsec() doesn't seem to have a reason to exist anymore. What is 5? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 7/9] cxl/pci: Use pci core's DVSEC functionality
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > Reduce maintenance burden of DVSEC query implementation by using the > centralized PCI core implementation. > > Signed-off-by: Ben Widawsky > --- > drivers/cxl/pci.c | 20 +--- > 1 file changed, 1 insertion(+), 19 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 5eaf2736f779..79d4d9b16d83 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -340,25 +340,7 @@ static void cxl_pci_unmap_regblock(struct cxl_mem *cxlm, > struct cxl_register_map > > static int cxl_pci_dvsec(struct pci_dev *pdev, int dvsec) cxl_pci_dvsec() has no reason to exist anymore. Let's just have the caller use pci_find_dvsec_capability() directly. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 5/9] cxl/pci: Make more use of cxl_register_map
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > The structure exists to pass around information about register mapping. > Using it more extensively cleans up many existing functions. I would have liked to have seen "add @base to cxl_register_map" and "use @map for @bar and @offset arguments" somewhere in this changelog to set expectations for what changes are included. That would have also highlighted that adding a @base to cxl_register_map deserves its own patch vs the conversion of @bar and @offset to instead use @map->bar and @map->offset. Can you resend with that split and those mentions? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/9] cxl/pci: Refactor cxl_pci_setup_regs
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > In preparation for moving parts of register mapping to cxl_core, the > cxl_pci driver is refactored to utilize a new helper to find register > blocks by type. > > cxl_pci scanned through all register blocks and mapping the ones that > the driver will use. This logic is inverted so that the driver > specifically requests the register blocks from a new helper. Under the > hood, the same implementation of scanning through all register locator > DVSEC entries exists. > > There are 2 behavioral changes (#2 is arguable): > 1. A dev_err is introduced if cxl_map_regs fails. > 2. The previous logic would try to map component registers and device >registers multiple times if there were present and keep the mapping >of the last one found (furthest offset in the register locator). >While this is disallowed in the spec, CXL 2.0 8.1.9: "Each register >block identifier shall only occur once in the Register Locator DVSEC >structure" it was how the driver would respond to the spec violation. >The new logic will take the first found register block by type and >move on. > > Signed-off-by: Ben Widawsky > > --- > Changes since v1: No changes? Luckily git am strips this section... Overall I think this refactor can be broken down further for readability and cleanup the long standing problem that the driver maps component registers for no reason. The main contributing factor to readability is that cxl_setup_pci_regs() still exists after the refactor, which also contributes to the component register problem. If the register mapping is up leveled to the caller of cxl_setup_pci_regs() (and drops mapping component registers) then a follow-on patch to rename cxl_setup_pci_regs to find_register_block becomes easier to read. Moving the cxl_register_map array out of cxl_setup_pci_regs() also makes a later patch to pass component register enumeration details to the endpoint-port that much cleaner. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting
Hi, Thomas, On Sun, Sep 26, 2021 at 01:13:50AM +0200, Thomas Gleixner wrote: > Fenghua, > > On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote: > > On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote: > >> But OTOH why do you need a per task reference count on the PASID at all? > >> > >> The PASID is fundamentaly tied to the mm and the mm can't go away before > >> the threads have gone away unless this magically changed after I checked > >> that ~20 years ago. > > > > There are up to 1M PASIDs because PASID is 20-bit. I think there are a few > > ways > > to allocate and free PASID: > > > > 1. Statically allocate a PASID once a mm is created and free it in mm > >exit. No PASID allocation/free during the mm's lifetime. Then > >up to 1M processes can be created due to 1M PASIDs limitation. > >We don't want this method because the 1M processes limitation. > > I'm not so worried about the 1M limitation, but it obviously makes sense > to avoid that because allocating stuff which is not used is pointless in > general. > > > 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There > >are three ways to free it: > >(a) Actively free it in close(fd)->unbind(dev, mm) by sending > >IPIs to tell all tasks using the PASID to clear the IA32_PASID > >MSR. This has locking issues similar to the actively loading > >IA32_PASID MSR which was force disabled in upstream. So won't work. > > Exactly. > > >(b) Passively free the PASID in destroy_context(mm) in mm exit. Once > >the PASID is allocated, it stays with the process for the lifetime. > > It's > >better than #1 because the PASID is allocated only on demand. > > Which is simple and makes a lot of sense. See below. > > >(c) Passively free the PASID in deactive_mm(mm) or unbind() whenever > > there > >is no usage as implemented in this series. Tracking the PASID usage > >per task provides a chance to free the PASID on task exit. The > >PASID has a better chance to be freed earlier than mm exit in #(b). > > > > This series uses #2 and #(c) to allocate and free the PASID for a better > > chance to ease the 1M PASIDs limitation pressure. For example, a thread > > doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while > > its sibling threads are still running. > > I'm not seeing that as a realistic problem. Applications which use this > kind of devices are unlikely to behave exactly that way. > > 2^20 PASIDs are really plenty and just adding code for the theoretical > case of PASID pressure is a pointless exercise IMO. It just adds > complexity for no reason. > > IMO reality will be that either you have long lived processes with tons > of threads which use such devices over and over or short lived forked > processes which open the device, do the job, close and exit. Both > scenarios are fine with allocate on first use and drop on process exit. > > I think with your approach you create overhead for applications which > use thread pools where the threads get work thrown at them and do open() > -> do_stuff() -> close() and then go back to wait for the next job which > will do exactly the same thing. So you add the overhead of refcounts in > general and in the worst case if the refcount drops to zero then the > next worker has to allocate a new PASID instead of just moving on. > > So unless you have a really compelling real world usecase argument, I'm > arguing that the PASID pressure problem is a purely academic exercise. > > I think you are conflating two things here: > > 1) PASID lifetime > 2) PASID MSR overhead > > Which is not correct: You still can and have to optimize the per thread > behaviour vs. the PASID MSR: Track per thread whether it ever needed the > PASID and act upon that. > > If the thread just does EMQCMD once in it's lifetime, then so be > it. That's not a realistic use case, really. > > And if someone does this then this does not mean we have to optimize for > that. Optimizing for possible stupid implementations is the wrong > approach. There is no technial measure against stupidity. If that would > exist the world would be a much better place. > > You really have to think about the problem space you are working > on. There are problems which need a 'get it right at the first shot' > solution because they create user space ABI or otheer hard to fix > dependencies. > > That's absolutely not the case here. > > Get the basic simple support correct and work from there. Trying to > solve all possible theoretical problems upfront is simply not possible > and a guarantee for not making progress. > > "Keep it simple" and "correctness first" are still the best working > engineering principles. > > They do not prevent us from revisiting this _if_ there is a real world > problem which makes enough sense to implement a finer grained solution. Sure. Will free the PASID in destroy_context() on mm exit and won't
Re: [PATCH v2 3/9] cxl/pci: Remove pci request/release regions
On Thu, Sep 23, 2021 at 10:26 AM Ben Widawsky wrote: > > Quoting Dan, "... the request + release regions should probably just be > dropped. It's not like any of the register enumeration would collide > with someone else who already has the registers mapped. The collision > only comes when the registers are mapped for their final usage, and that > will have more precision in the request." Looks good to me: Reviewed-by: Dan Williams > > Recommended-by: Dan Williams This isn't one of the canonical tags: Documentation/process/submitting-patches.rst I'll change this to Suggested-by: > Signed-off-by: Ben Widawsky > --- > drivers/cxl/pci.c | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index ccc7c2573ddc..7256c236fdb3 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -453,9 +453,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm) > return -ENXIO; > } > > - if (pci_request_mem_regions(pdev, pci_name(pdev))) > - return -ENODEV; > - > /* Get the size of the Register Locator DVSEC */ > pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, _size); > regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size); > @@ -499,8 +496,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm) > n_maps++; > } > > - pci_release_mem_regions(pdev); > - > for (i = 0; i < n_maps; i++) { > ret = cxl_map_regs(cxlm, [i]); > if (ret) > -- > 2.33.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/9] cxl/pci: Remove dev_dbg for unknown register blocks
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > While interesting to driver developers, the dev_dbg message doesn't do > much except clutter up logs. This information should be attainable > through sysfs, and someday lspci like utilities. This change > additionally helps reduce the LOC in a subsequent patch to refactor some > of cxl_pci register mapping. Looks good to me: Reviewed-by: Dan Williams ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Tue, Sep 28, 2021 at 09:35:05PM +0800, Lu Baolu wrote: > Another issue is, when putting a device into user-dma mode, all devices > belonging to the same iommu group shouldn't be bound with a kernel-dma > driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is > not lock safe as discussed below, > > https://lore.kernel.org/linux-iommu/20210927130935.gz964...@nvidia.com/ > > Any guidance on this? Something like this? int iommu_set_device_dma_owner(struct device *dev, enum device_dma_owner mode, struct file *user_owner) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); switch (mode) { case DMA_OWNER_KERNEL: if (iommu_group->dma_users[DMA_OWNER_USERSPACE]) return -EBUSY; break; case DMA_OWNER_SHARED: break; case DMA_OWNER_USERSPACE: if (iommu_group->dma_users[DMA_OWNER_KERNEL]) return -EBUSY; if (iommu_group->dma_owner_file != user_owner) { if (iommu_group->dma_users[DMA_OWNER_USERSPACE]) return -EPERM; get_file(user_owner); iommu_group->dma_owner_file = user_owner; } break; default: spin_unlock(_group->dma_owner_lock); return -EINVAL; } iommu_group->dma_users[mode]++; spin_unlock(_group->dma_owner_lock); return 0; } int iommu_release_device_dma_owner(struct device *dev, enum device_dma_owner mode) { struct iommu_group *group = group_from_dev(dev); spin_lock(_group->dma_owner_lock); if (WARN_ON(!iommu_group->dma_users[mode])) goto err_unlock; if (!iommu_group->dma_users[mode]--) { if (mode == DMA_OWNER_USERSPACE) { fput(iommu_group->dma_owner_file); iommu_group->dma_owner_file = NULL; } } err_unlock: spin_unlock(_group->dma_owner_lock); } Where, the driver core does before probe: iommu_set_device_dma_owner(dev, DMA_OWNER_KERNEL, NULL) pci_stub/etc does in their probe func: iommu_set_device_dma_owner(dev, DMA_OWNER_SHARED, NULL) And vfio/iommfd does when a struct vfio_device FD is attached: iommu_set_device_dma_owner(dev, DMA_OWNER_USERSPACE, group_file/iommu_file) Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] [v2] qcom_scm: hide Kconfig symbol
On 9/28/21 2:50 AM, Arnd Bergmann wrote: From: Arnd Bergmann Now that SCM can be a loadable module, we have to add another dependency to avoid link failures when ipa or adreno-gpu are built-in: aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe': ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available' ld.lld: error: undefined symbol: qcom_scm_is_available referenced by adreno_gpu.c gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in archive drivers/built-in.a This can happen when CONFIG_ARCH_QCOM is disabled and we don't select QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd use a similar dependency here to what we have for QCOM_RPROC_COMMON, but that causes dependency loops from other things selecting QCOM_SCM. This appears to be an endless problem, so try something different this time: - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on' but that is simply selected by all of its users - All the stubs in include/linux/qcom_scm.h can go away - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to allow compile-testing QCOM_SCM on all architectures. - To avoid a circular dependency chain involving RESET_CONTROLLER and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement. According to my testing this still builds fine, and the QCOM platform selects this symbol already. Acked-by: Kalle Valo Signed-off-by: Arnd Bergmann --- Changes in v2: - drop the 'select RESET_CONTROLLER' line, rather than adding more of the same --- drivers/firmware/Kconfig| 5 +- drivers/gpu/drm/msm/Kconfig | 4 +- drivers/iommu/Kconfig | 2 +- drivers/media/platform/Kconfig | 2 +- drivers/mmc/host/Kconfig| 2 +- drivers/net/ipa/Kconfig | 1 + For drivers/net/ipa/Kconfig, looks good to me. Nice simplification. Acked-by: Alex Elder drivers/net/wireless/ath/ath10k/Kconfig | 2 +- drivers/pinctrl/qcom/Kconfig| 3 +- include/linux/arm-smccc.h | 10 include/linux/qcom_scm.h| 71 - 10 files changed, 20 insertions(+), 82 deletions(-) . . . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
Hi Jason, On 2021/9/28 19:57, Jason Gunthorpe wrote: On Tue, Sep 28, 2021 at 07:30:41AM +, Tian, Kevin wrote: Also, don't call it "hint", there is nothing hinty about this, it has definitive functional impacts. possibly dma_mode (too broad?) or dma_usage You just need a flag to specify if the driver manages DMA ownership itself, or if it requires the driver core to setup kernel ownership DMA_OWNER_KERNEL DMA_OWNER_DRIVER_CONTROLLED ? There is a bool 'suprress_bind_attrs' already so it could be done like this: bool suppress_bind_attrs:1; /* If set the driver must call iommu_XX as the first action in probe() */ bool suppress_dma_owner:1; Which is pretty low cost. Yes. Pretty low cost to fix the BUG_ON() issue. Any kernel-DMA driver binding is blocked if the device's iommu group has been put into user- dma mode. Another issue is, when putting a device into user-dma mode, all devices belonging to the same iommu group shouldn't be bound with a kernel-dma driver. Kevin's prototype checks this by READ_ONCE(dev->driver). This is not lock safe as discussed below, https://lore.kernel.org/linux-iommu/20210927130935.gz964...@nvidia.com/ Any guidance on this? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Tue, Sep 28, 2021 at 07:30:41AM +, Tian, Kevin wrote: > > Also, don't call it "hint", there is nothing hinty about this, it has > > definitive functional impacts. > > possibly dma_mode (too broad?) or dma_usage You just need a flag to specify if the driver manages DMA ownership itself, or if it requires the driver core to setup kernel ownership DMA_OWNER_KERNEL DMA_OWNER_DRIVER_CONTROLLED ? There is a bool 'suprress_bind_attrs' already so it could be done like this: bool suppress_bind_attrs:1; /* If set the driver must call iommu_XX as the first action in probe() */ bool suppress_dma_owner:1; Which is pretty low cost. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5] iommu/amd: Use report_iommu_fault()
This patch makes iommu/amd call report_iommu_fault() when an I/O page fault occurs, which has two effects: 1) It allows device drivers to register a callback to be notified of I/O page faults, via the iommu_set_fault_handler() API. 2) It triggers the io_page_fault tracepoint in report_iommu_fault() when an I/O page fault occurs. The latter point is the main aim of this patch, as it allows rasdaemon-like daemons to be notified of I/O page faults, and to possibly initiate corrective action in response. A number of other IOMMU drivers already use report_iommu_fault(), and I/O page faults on those IOMMUs therefore already trigger this tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR. The AMD IOMMU specification suggests that the bit in an I/O page fault event log entry that signals whether an I/O page fault was for a read request or for a write request is only meaningful when the faulting access was to a present page, but some testing on a Ryzen 3700X suggests that this bit encodes the correct value even for I/O page faults to non-present pages, and therefore, this patch passes the R/W information up the stack even for I/O page faults to non-present pages. Signed-off-by: Lennert Buytenhek --- Tested on v5.15-rc3 on a Ryzen 3700X and on a Ryzen 5950X, where it has the desired effect. Changes for v5: - Code formatting tweaking. (Suggested by Joerg Roedel.) Changes for v4: - Unconditionally pass through RW bit, after testing that suggests that this bit is reliable even for I/O page faults to non-present pages. Changes for v3: - Test fault flags via macros. (Suggested by Suravee Suthikulpanit.) Changes for v2: - Don't call report_iommu_fault() for IRQ remapping faults. (Suggested by Joerg Roedel.) drivers/iommu/amd/amd_iommu_types.h | 2 ++ drivers/iommu/amd/iommu.c | 21 + 2 files changed, 23 insertions(+) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 8dbe61e2b3c1..867535eb0ce9 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -138,6 +138,8 @@ #define EVENT_DOMID_MASK_HI0xf #define EVENT_FLAGS_MASK 0xfff #define EVENT_FLAGS_SHIFT 0x10 +#define EVENT_FLAG_RW 0x020 +#define EVENT_FLAG_I 0x008 /* feature control bits */ #define CONTROL_IOMMU_EN0x00ULL diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 1722bb161841..beadcffcc223 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -473,6 +473,12 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event) pci_dev_put(pdev); } +#define IS_IOMMU_MEM_TRANSACTION(flags)\ + (((flags) & EVENT_FLAG_I) == 0) + +#define IS_WRITE_REQUEST(flags)\ + ((flags) & EVENT_FLAG_RW) + static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, u64 address, int flags) { @@ -485,6 +491,20 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, dev_data = dev_iommu_priv_get(>dev); if (dev_data) { + /* +* If this is a DMA fault (for which the I(nterrupt) +* bit will be unset), allow report_iommu_fault() to +* prevent logging it. +*/ + if (IS_IOMMU_MEM_TRANSACTION(flags)) { + if (!report_iommu_fault(_data->domain->domain, + >dev, address, + IS_WRITE_REQUEST(flags) ? + IOMMU_FAULT_WRITE : + IOMMU_FAULT_READ)) + goto out; + } + if (__ratelimit(_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); @@ -495,6 +515,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, domain_id, address, flags); } +out: if (pdev) pci_dev_put(pdev); } -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Tidy up Kconfig selects
On 28.09.2021 11:30, Joerg Roedel wrote: > On Mon, Sep 13, 2021 at 01:41:19PM +0100, Robin Murphy wrote: >> Now that the dust has settled on converting all the x86 drivers to >> iommu-dma, we can punt the Kconfig selection to arch code where it >> was always intended to be. > Can we select IOMMU_DMA under IOMMU_SUPPORT instead? The only drivers > not using IOMMU_DMA are the arm32 ones, afaics. > > If we could get rid of the arm32 exception, the IOMMU_DMA symbol could > also go away entirely and we handle it under IOMMU_SUPPORT instead. But > that is something for the future :) Maybe it would be a good motivation to get back to https://lore.kernel.org/linux-iommu/cover.1597931875.git.robin.mur...@arm.com/ :) Best regards -- Marek Szyprowski, PhD Samsung R Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/mediatek: fix out-of-range warning with clang
On Mon, Sep 27, 2021 at 02:18:44PM +0200, Arnd Bergmann wrote: > drivers/iommu/mtk_iommu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] iommu/amd: Use report_iommu_fault()
On Sat, Sep 25, 2021 at 05:18:02PM +0300, Lennert Buytenhek wrote: > +#define IS_IOMMU_MEM_TRANSACTION(flags) \ > + (((flags) & EVENT_FLAG_I) == 0) > + > +#define IS_WRITE_REQUEST(flags) \ > + ((flags) & EVENT_FLAG_RW) > + > static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, > u64 address, int flags) > { > @@ -484,6 +490,18 @@ static void amd_iommu_report_page_fault(u16 devid, u16 > domain_id, > if (pdev) > dev_data = dev_iommu_priv_get(>dev); > > + /* > + * If this is a DMA fault (for which the I(nterrupt) bit will > + * be unset), allow report_iommu_fault() to prevent logging it. > + */ > + if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) { > + if (!report_iommu_fault(_data->domain->domain, > + >dev, address, > + IS_WRITE_REQUEST(flags) ? > + IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) > + goto out; This is better placed inside the 'if (dev_data)' statement below. Otherwise it looks good to me. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dart: Clear sid2group entry when a group is freed
On Fri, Sep 24, 2021 at 03:45:02PM +0200, Sven Peter wrote: > drivers/iommu/apple-dart.c | 38 +++--- > 1 file changed, 35 insertions(+), 3 deletions(-) Applied for v5.15, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Hook up r8a77980 DT matching code
On Thu, Sep 23, 2021 at 10:11:16PM +0300, Nikita Yushchenko wrote: > drivers/iommu/ipmmu-vmsa.c | 3 +++ > 1 file changed, 3 insertions(+) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/1] iommu/vt-d: A fix for v5.15-rc3
On Wed, Sep 22, 2021 at 01:47:25PM +0800, Lu Baolu wrote: > Bjorn Helgaas (1): > iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dart: Remove iommu_flush_ops
On Tue, Sep 21, 2021 at 05:39:34PM +0200, Sven Peter wrote: > Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver") > Reported-by: Marc Zyngier > Signed-off-by: Sven Peter > --- > drivers/iommu/apple-dart.c | 18 -- > 1 file changed, 18 deletions(-) Applied for v5.15, thanks Sven. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Tidy up Kconfig selects
On Mon, Sep 13, 2021 at 01:41:19PM +0100, Robin Murphy wrote: > Now that the dust has settled on converting all the x86 drivers to > iommu-dma, we can punt the Kconfig selection to arch code where it > was always intended to be. Can we select IOMMU_DMA under IOMMU_SUPPORT instead? The only drivers not using IOMMU_DMA are the arm32 ones, afaics. If we could get rid of the arm32 exception, the IOMMU_DMA symbol could also go away entirely and we handle it under IOMMU_SUPPORT instead. But that is something for the future :) Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver
On 9/28/2021 1:39 PM, Christoph Hellwig wrote: On Mon, Sep 27, 2021 at 10:26:43PM +0800, Tianyu Lan wrote: Hi Christoph: Gentile ping. The swiotlb and shared memory mapping changes in this patchset needs your reivew. Could you have a look? > I'm a little too busy for a review of such a huge patchset right now. That being said here are my comments from a very quick review: Hi Christoph: Thanks for your comments. Most patches in the series are Hyper-V change. I will split patchset and make it easy to review. - the bare memremap usage in swiotlb looks strange and I'd definitively expect a well documented wrapper. OK. Should the wrapper in the DMA code? How about dma_map_decrypted() introduced in the V4? https://lkml.org/lkml/2021/8/27/605 - given that we can now hand out swiotlb memory for coherent mappings we need to carefully audit what happens when this memremaped memory gets mmaped or used through dma_get_sgtable OK. I check that. - the netscv changes I'm not happy with at all. A large part of it is that the driver already has a bad structure, but this series is making it significantly worse. We'll need to find a way to use the proper dma mapping abstractions here. One option if you want to stick to the double vmapped buffer would be something like using dma_alloc_noncontigous plus a variant of dma_vmap_noncontiguous that takes the shared_gpa_boundary into account. OK. I will do that. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: DMA domain epilogue
Hi Robin, On Mon, Sep 13, 2021 at 01:40:04PM +0100, Robin Murphy wrote: > Robin Murphy (2): > iommu/dart: Clean up IOVA cookie crumbs > iommu/dma: Unexport IOVA cookie management Thanks, all applied now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu