Re: [PATCH 1/5] iommu: Implement iommu_put_resv_regions_simple()

2019-09-18 Thread Will Deacon
On Thu, Aug 29, 2019 at 01:17:48PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Implement a generic function for removing reserved regions. This can be
> used by drivers that don't do anything fancy with these regions other
> than allocating memory for them.
> 
> Signed-off-by: Thierry Reding 
> ---
>  drivers/iommu/iommu.c | 19 +++
>  include/linux/iommu.h |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0f585b614657..73a2a6b13507 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2170,6 +2170,25 @@ void iommu_put_resv_regions(struct device *dev, struct 
> list_head *list)
>   ops->put_resv_regions(dev, list);
>  }
>  
> +/**
> + * iommu_put_resv_regions_simple - Reserved region driver helper
> + * @dev: device for which to free reserved regions
> + * @list: reserved region list for device
> + *
> + * IOMMU drivers can use this to implement their .put_resv_regions() callback
> + * for simple reservations. Memory allocated for each reserved region will be
> + * freed. If an IOMMU driver allocates additional resources per region, it is
> + * going to have to implement a custom callback.
> + */
> +void iommu_put_resv_regions_simple(struct device *dev, struct list_head 
> *list)
> +{
> + struct iommu_resv_region *entry, *next;
> +
> + list_for_each_entry_safe(entry, next, list, list)
> + kfree(entry);
> +}
> +EXPORT_SYMBOL(iommu_put_resv_regions_simple);

Can you call this directly from iommu_put_resv_regions() if the function
pointer in ops is NULL? That would save having to plumb the default callback
into a bunch of drivers.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v4 0/3] vhost: introduce mdev based hardware backend

2019-09-18 Thread Michael S. Tsirkin
On Wed, Sep 18, 2019 at 01:51:21PM +0800, Jason Wang wrote:
> 
> On 2019/9/17 下午6:58, Tiwei Bie wrote:
> > On Tue, Sep 17, 2019 at 11:32:03AM +0800, Jason Wang wrote:
> > > On 2019/9/17 上午9:02, Tiwei Bie wrote:
> > > > This RFC is to demonstrate below ideas,
> > > > 
> > > > a) Build vhost-mdev on top of the same abstraction defined in
> > > >  the virtio-mdev series [1];
> > > > 
> > > > b) Introduce /dev/vhost-mdev to do vhost ioctls and support
> > > >  setting mdev device as backend;
> > > > 
> > > > Now the userspace API looks like this:
> > > > 
> > > > - Userspace generates a compatible mdev device;
> > > > 
> > > > - Userspace opens this mdev device with VFIO API (including
> > > > doing IOMMU programming for this mdev device with VFIO's
> > > > container/group based interface);
> > > > 
> > > > - Userspace opens /dev/vhost-mdev and gets vhost fd;
> > > > 
> > > > - Userspace uses vhost ioctls to setup vhost (userspace should
> > > > do VHOST_MDEV_SET_BACKEND ioctl with VFIO group fd and device
> > > > fd first before doing other vhost ioctls);
> > > > 
> > > > Only compile test has been done for this series for now.
> > > 
> > > Have a hard thought on the architecture:
> > Thanks a lot! Do appreciate it!
> > 
> > > 1) Create a vhost char device and pass vfio mdev device fd to it as a
> > > backend and translate vhost-mdev ioctl to virtio mdev transport (e.g
> > > read/write). DMA was done through the VFIO DMA mapping on the container 
> > > that
> > > is attached.
> > Yeah, that's what we are doing in this series.
> > 
> > > We have two more choices:
> > > 
> > > 2) Use vfio-mdev but do not create vhost-mdev device, instead, just
> > > implement vhost ioctl on vfio_device_ops, and translate them into
> > > virtio-mdev transport or just pass ioctl to parent.
> > Yeah. Instead of introducing /dev/vhost-mdev char device, do
> > vhost ioctls on VFIO device fd directly. That's what we did
> > in RFC v3.
> > 
> > > 3) Don't use vfio-mdev, create a new vhost-mdev driver, during probe still
> > > try to add dev to vfio group and talk to parent with device specific ops
> > If my understanding is correct, this means we need to introduce
> > a new VFIO device driver to replace the existing vfio-mdev driver
> > in our case. Below is a quick draft just to show my understanding:
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > 
> > #include "mdev_private.h"
> > 
> > /* XXX: we need a proper way to include below vhost header. */
> > #include "../../vhost/vhost.h"
> > 
> > static int vfio_vhost_mdev_open(void *device_data)
> > {
> > if (!try_module_get(THIS_MODULE))
> > return -ENODEV;
> > 
> > /* ... */
> > vhost_dev_init(...);
> > 
> > return 0;
> > }
> > 
> > static void vfio_vhost_mdev_release(void *device_data)
> > {
> > /* ... */
> > module_put(THIS_MODULE);
> > }
> > 
> > static long vfio_vhost_mdev_unlocked_ioctl(void *device_data,
> >unsigned int cmd, unsigned long arg)
> > {
> > struct mdev_device *mdev = device_data;
> > struct mdev_parent *parent = mdev->parent;
> > 
> > /*
> >  * Use vhost ioctls.
> >  *
> >  * We will have a different parent_ops design.
> >  * And potentially, we can share the same parent_ops
> >  * with virtio_mdev.
> >  */
> > switch (cmd) {
> > case VHOST_GET_FEATURES:
> > parent->ops->get_features(mdev, ...);
> > break;
> > /* ... */
> > }
> > 
> > return 0;
> > }
> > 
> > static ssize_t vfio_vhost_mdev_read(void *device_data, char __user *buf,
> > size_t count, loff_t *ppos)
> > {
> > /* ... */
> > return 0;
> > }
> > 
> > static ssize_t vfio_vhost_mdev_write(void *device_data, const char __user 
> > *buf,
> >  size_t count, loff_t *ppos)
> > {
> > /* ... */
> > return 0;
> > }
> > 
> > static int vfio_vhost_mdev_mmap(void *device_data, struct vm_area_struct 
> > *vma)
> > {
> > /* ... */
> > return 0;
> > }
> > 
> > static const struct vfio_device_ops vfio_vhost_mdev_dev_ops = {
> > .name   = "vfio-vhost-mdev",
> > .open   = vfio_vhost_mdev_open,
> > .release= vfio_vhost_mdev_release,
> > .ioctl  = vfio_vhost_mdev_unlocked_ioctl,
> > .read   = vfio_vhost_mdev_read,
> > .write  = vfio_vhost_mdev_write,
> > .mmap   = vfio_vhost_mdev_mmap,
> > };
> > 
> > static int vfio_vhost_mdev_probe(struct device *dev)
> > {
> > struct mdev_device *mdev = to_mdev_device(dev);
> > 
> > /* ... */
> > return vfio_add_group_dev(dev, _vhost_mdev_dev_ops, mdev);
> > }
> > 
> > static void vfio_vhost_mdev_remove(struct device *dev)
> > {
> > /* ... */
> > vfio_del_group_dev(dev);
> > }
> > 
> > static struct mdev_driver vfio_vhost_mdev_driver = {
> > .name   = 

Re: [PATCH v6] virtio-fs: add virtiofs filesystem

2019-09-18 Thread Michael S. Tsirkin
On Thu, Sep 12, 2019 at 04:19:31PM +0200, Miklos Szeredi wrote:
> From: Stefan Hajnoczi 
> 
> Michael,
> 
> Here's a v6 of the virtiofs code (fuse.git#virtiofs-v6).  I think we've
> addressed all your comments.
> 
> Would you mind giving it another look, and if you're satisfied acking this
> patch?
> 
> Thanks,
> Miklos
> 
> 
> Add a basic file system module for virtio-fs.  This does not yet contain
> shared data support between host and guest or metadata coherency speedups.
> However it is already significantly faster than virtio-9p.
> 
> Design Overview
> ===
> 
> With the goal of designing something with better performance and local file
> system semantics, a bunch of ideas were proposed.
> 
>  - Use fuse protocol (instead of 9p) for communication between guest and
>host.  Guest kernel will be fuse client and a fuse server will run on
>host to serve the requests.
> 
>  - For data access inside guest, mmap portion of file in QEMU address space
>and guest accesses this memory using dax.  That way guest page cache is
>bypassed and there is only one copy of data (on host).  This will also
>enable mmap(MAP_SHARED) between guests.
> 
>  - For metadata coherency, there is a shared memory region which contains
>version number associated with metadata and any guest changing metadata
>updates version number and other guests refresh metadata on next access.
>This is yet to be implemented.
> 
> How virtio-fs differs from existing approaches
> ==
> 
> The unique idea behind virtio-fs is to take advantage of the co-location of
> the virtual machine and hypervisor to avoid communication (vmexits).
> 
> DAX allows file contents to be accessed without communication with the
> hypervisor.  The shared memory region for metadata avoids communication in
> the common case where metadata is unchanged.
> 
> By replacing expensive communication with cheaper shared memory accesses,
> we expect to achieve better performance than approaches based on network
> file system protocols.  In addition, this also makes it easier to achieve
> local file system semantics (coherency).
> 
> These techniques are not applicable to network file system protocols since
> the communications channel is bypassed by taking advantage of shared memory
> on a local machine.  This is why we decided to build virtio-fs rather than
> focus on 9P or NFS.
> 
> Caching Modes
> =
> 
> Like virtio-9p, different caching modes are supported which determine the
> coherency level as well.  The “cache=FOO” and “writeback” options control
> the level of coherence between the guest and host filesystems.
> 
>  - cache=none
>metadata, data and pathname lookup are not cached in guest.  They are
>always fetched from host and any changes are immediately pushed to host.
> 
>  - cache=always
>metadata, data and pathname lookup are cached in guest and never expire.
> 
>  - cache=auto
>metadata and pathname lookup cache expires after a configured amount of
>time (default is 1 second).  Data is cached while the file is open
>(close to open consistency).
> 
>  - writeback/no_writeback
>These options control the writeback strategy.  If writeback is disabled,
>then normal writes will immediately be synchronized with the host fs.
>If writeback is enabled, then writes may be cached in the guest until
>the file is closed or an fsync(2) performed.  This option has no effect
>on mmap-ed writes or writes going through the DAX mechanism.
> 
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Vivek Goyal 
> Signed-off-by: Miklos Szeredi 


Overall this looks ok to me. Handling full vq by a timer is really gross
but it's correct - just terribly inefficient.  I think you should add a
MAINTAINERS entry though, we want
virtualization@lists.linux-foundation.org Cc'd on patches.

With that corrected:

Acked-by: Michael S. Tsirkin 

Who's going to merge this? Miklos do you want to merge it yourself?

> ---
>  fs/fuse/Kconfig |   11 +
>  fs/fuse/Makefile|1 +
>  fs/fuse/fuse_i.h|9 +
>  fs/fuse/inode.c |4 +
>  fs/fuse/virtio_fs.c | 1195 +++
>  include/uapi/linux/virtio_fs.h  |   19 +
>  include/uapi/linux/virtio_ids.h |1 +
>  7 files changed, 1240 insertions(+)
>  create mode 100644 fs/fuse/virtio_fs.c
>  create mode 100644 include/uapi/linux/virtio_fs.h
> 
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 24fc5a5c1b97..0635cba19971 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -27,3 +27,14 @@ config CUSE
>  
> If you want to develop or use a userspace character device
> based on CUSE, answer Y or M.
> +
> +config VIRTIO_FS
> + tristate "Virtio Filesystem"
> + depends on FUSE_FS
> + select VIRTIO
> + help
> +   The Virtio Filesystem allows guests to mount file systems from the
> +  

Re: [RFC PATCH 2/2] mdev: introduce device specific ops

2019-09-18 Thread Jason Wang


On 2019/9/18 上午10:57, Tian, Kevin wrote:

From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Tuesday, September 17, 2019 6:17 PM

On 2019/9/17 下午4:09, Tian, Kevin wrote:

From: Jason Wang
Sent: Thursday, September 12, 2019 5:40 PM

Currently, except for the crate and remove. The rest fields of
mdev_parent_ops is just designed for vfio-mdev driver and may not

help

for kernel mdev driver. So follow the device id support by previous
patch, this patch introduces device specific ops which points to
device specific ops (e.g vfio ops). This allows the future drivers
like virtio-mdev to implement its own device specific ops.

Can you give an example about what ops might be required to support
kernel mdev driver? I know you posted a link earlier, but putting a small
example here can save time and avoid inconsistent understanding. Then
it will help whether the proposed split makes sense or there is a
possibility of redefining the callbacks to meet the both requirements.
imo those callbacks fulfill some basic requirements when mediating
a device...

I put it in the cover letter.

The link ishttps://lkml.org/lkml/2019/9/10/135  which abuses the current
VFIO based mdev parent ops.

Thanks

So the main problem is the handling of userspace pointers vs.
kernel space pointers. You still implement read/write/ioctl
callbacks which is a subset of current parent_ops definition.
In that regard is it better to introduce some helper to handle
the pointer difference in mdev core, while still keeping the
same set of parent ops (in whatever form suitable for both)?



Pointers is one of the issues. And read/write/ioctl is designed for 
userspace API not kernel. Technically, we can use them for kernel but it 
would not be as simple and straightforward a set of device specific 
callbacks functions. The link above is just an example, e.g we can 
simply pass the vring address through a dedicated API instead of 
mandatory an offset of a file.


Thanks




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization