Re: [PATCH V3 1/7] mdev: class id support

2019-10-15 Thread Jason Wang


On 2019/10/16 上午12:38, Alex Williamson wrote:

On Fri, 11 Oct 2019 16:15:51 +0800
Jason Wang  wrote:
   

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..724e9b9841d8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
  }
  EXPORT_SYMBOL(mdev_set_drvdata);
  
+void mdev_set_class(struct mdev_device *mdev, u16 id)

+{
+   mdev->class_id = id;
+}
+EXPORT_SYMBOL(mdev_set_class);
+
  struct device *mdev_dev(struct mdev_device *mdev)
  {
return >dev;
@@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void 
*data)
   * mdev_register_device : Register a device
   * @dev: device structure representing parent device.
   * @ops: Parent device operation structure to be registered.
+ * @id: class id.
   *
   * Add device to list of registered parent devices.
   * Returns a negative value on error, otherwise 0.
@@ -324,6 +331,9 @@ int mdev_device_create(struct kobject *kobj,
if (ret)
goto ops_create_fail;
  
+	if (!mdev->class_id)

This is a sanity test failure of the parent driver on a privileged
path, I think it's fair to print a warning when this occurs rather than
only return an errno to the user.  In fact, ret is not set to an error
value here, so it looks like this fails to create the device but
returns success.  Thanks,

Alex



Will fix.

Thanks





+   goto class_id_fail;
+
ret = device_add(>dev);
if (ret)
goto add_fail;
@@ -340,6 +350,7 @@ int mdev_device_create(struct kobject *kobj,
  
  sysfs_fail:

device_del(>dev);
+class_id_fail:
  add_fail:
parent->ops->remove(mdev);
  ops_create_fail:

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct

2019-10-15 Thread Jason Wang


On 2019/10/16 上午4:20, Michael S. Tsirkin wrote:

On Mon, Oct 14, 2019 at 09:43:25AM +0800, Jason Wang wrote:

On 2019/10/13 上午4:27, Michael S. Tsirkin wrote:

On Sat, Oct 12, 2019 at 03:28:49PM +0800, Jason Wang wrote:

On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:

The idea is to support multiple ring formats by converting
to a format-independent array of descriptors.

This costs extra cycles, but we gain in ability
to fetch a batch of descriptors in one go, which
is good for code cache locality.

To simplify benchmarking, I kept the old code
around so one can switch back and forth by
writing into a module parameter.
This will go away in the final submission.

This patch causes a minor performance degradation,
it's been kept as simple as possible for ease of review.
Next patch gets us back the performance by adding batching.

Signed-off-by: Michael S. Tsirkin 
---
drivers/vhost/test.c  |  17 ++-
drivers/vhost/vhost.c | 299 +-
drivers/vhost/vhost.h |  16 +++
3 files changed, 327 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 056308008288..39a018a7af2d 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -18,6 +18,9 @@
#include "test.h"
#include "vhost.h"
+static int newcode = 0;
+module_param(newcode, int, 0644);
+
/* Max number of bytes transferred before requeueing the job.
 * Using this limit prevents one virtqueue from starving others. */
#define VHOST_TEST_WEIGHT 0x8
@@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n)
vhost_disable_notify(>dev, vq);
for (;;) {
-   head = vhost_get_vq_desc(vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-, ,
-NULL, NULL);
+   if (newcode)
+   head = vhost_get_vq_desc_batch(vq, vq->iov,
+  ARRAY_SIZE(vq->iov),
+  , ,
+  NULL, NULL);
+   else
+   head = vhost_get_vq_desc(vq, vq->iov,
+ARRAY_SIZE(vq->iov),
+, ,
+NULL, NULL);
/* On error, stop handling until the next kick. */
if (unlikely(head < 0))
break;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 36ca2cf419bf..36661d6cb51f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
{
vq->num = 1;
+   vq->ndescs = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -369,6 +370,9 @@ static int vhost_worker(void *data)
static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
{
+   kfree(vq->descs);
+   vq->descs = NULL;
+   vq->max_descs = 0;
kfree(vq->indirect);
vq->indirect = NULL;
kfree(vq->log);
@@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->max_descs = dev->iov_limit;
+   vq->descs = kmalloc_array(vq->max_descs,
+ sizeof(*vq->descs),
+ GFP_KERNEL);

Is iov_limit too much here? It can obviously increase the footprint. I guess
the batching can only be done for descriptor without indirect or next set.
Then we may batch 16 or 64.

Thanks

Yes, next patch only batches up to 64.  But we do need iov_limit because
guest can pass a long chain of scatter/gather.
We already have iovecs in a huge array so this does not look like
a big deal. If we ever teach the code to avoid the huge
iov arrays by handling huge s/g lists piece by piece,
we can make the desc array smaller at the same point.


Another possible issue, if we try to batch descriptor chain when we've
already batched some descriptors, we may reach the limit then some of the
descriptors might need re-read.

Or we may need circular index (head, tail) in this case?

Thanks

We never supported more than IOV_MAX descriptors.
And we don't batch more than iov_limit - IOV_MAX.



Ok, but what happens when we've already batched 63 descriptors then come 
a 3 descriptor chain?


And it looks to me we need forget the cached descriptor during 
set_vring_base()


Thanks




so buffer never overflows.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'

2019-10-15 Thread David Miller
From: Stefano Garzarella 
Date: Tue, 15 Oct 2019 17:00:51 +0200

> The 'work' field was introduced with commit 06a8fc78367d0
> ("VSOCK: Introduce virtio_vsock_common.ko")
> but it is never used in the code, so we can remove it to save
> memory allocated in the per-packet 'struct virtio_vsock_pkt'
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 

Michael, will you take this?  I assume so...

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 1/2] vhost: IFC VF hardware operation layer

2019-10-15 Thread Stephen Hemminger
On Wed, 16 Oct 2019 09:03:17 +0800
Zhu Lingshan  wrote:

> + IFC_INFO(>dev, "PCI capability mapping:\n"
> + "common cfg: %p\n"
> + "notify base: %p\n"
> + "isr cfg: %p\n"
> + "device cfg: %p\n"
> + "multiplier: %u\n",
> + hw->common_cfg,
> + hw->notify_base,
> + hw->isr,
> + hw->dev_cfg,
> + hw->notify_off_multiplier);

Since kernel messages go to syslog, syslog does not handle multi-line
messages very well. This should be a single line.

Also, this is the kind of message that should be at the debug
level; something that is useful to the driver developers
but not something that needs to be filling up every end users log.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 1/2] vhost: IFC VF hardware operation layer

2019-10-15 Thread Stephen Hemminger
On Wed, 16 Oct 2019 09:03:17 +0800
Zhu Lingshan  wrote:

> +int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev)
> +{
> + int ret;
> + u8 pos;
> + struct virtio_pci_cap cap;
> + u32 i;
> + u16 notify_off;

For network code, the preferred declaration style is
reverse christmas tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PULL] vhost: cleanups and fixes

2019-10-15 Thread pr-tracker-bot
The pull request you sent on Tue, 15 Oct 2019 17:19:08 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3b1f00aceb7a67bf079a5a64aa5c6baf78a8f442

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PULL] vhost: cleanups and fixes

2019-10-15 Thread Michael S. Tsirkin
The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce:

  Linux 5.4-rc2 (2019-10-06 14:27:30 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 245cdd9fbd396483d501db83047116e2530f245f:

  vhost/test: stop device before reset (2019-10-13 09:38:27 -0400)


virtio: fixes

Some minor bugfixes

Signed-off-by: Michael S. Tsirkin 


Michael S. Tsirkin (3):
  tools/virtio: more stubs
  tools/virtio: xen stub
  vhost/test: stop device before reset

 drivers/vhost/test.c | 2 ++
 tools/virtio/crypto/hash.h   | 0
 tools/virtio/linux/dma-mapping.h | 2 ++
 tools/virtio/xen/xen.h   | 6 ++
 4 files changed, 10 insertions(+)
 create mode 100644 tools/virtio/crypto/hash.h
 create mode 100644 tools/virtio/xen/xen.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct

2019-10-15 Thread Michael S. Tsirkin
On Mon, Oct 14, 2019 at 09:43:25AM +0800, Jason Wang wrote:
> 
> On 2019/10/13 上午4:27, Michael S. Tsirkin wrote:
> > On Sat, Oct 12, 2019 at 03:28:49PM +0800, Jason Wang wrote:
> > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:
> > > > The idea is to support multiple ring formats by converting
> > > > to a format-independent array of descriptors.
> > > > 
> > > > This costs extra cycles, but we gain in ability
> > > > to fetch a batch of descriptors in one go, which
> > > > is good for code cache locality.
> > > > 
> > > > To simplify benchmarking, I kept the old code
> > > > around so one can switch back and forth by
> > > > writing into a module parameter.
> > > > This will go away in the final submission.
> > > > 
> > > > This patch causes a minor performance degradation,
> > > > it's been kept as simple as possible for ease of review.
> > > > Next patch gets us back the performance by adding batching.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >drivers/vhost/test.c  |  17 ++-
> > > >drivers/vhost/vhost.c | 299 
> > > > +-
> > > >drivers/vhost/vhost.h |  16 +++
> > > >3 files changed, 327 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> > > > index 056308008288..39a018a7af2d 100644
> > > > --- a/drivers/vhost/test.c
> > > > +++ b/drivers/vhost/test.c
> > > > @@ -18,6 +18,9 @@
> > > >#include "test.h"
> > > >#include "vhost.h"
> > > > +static int newcode = 0;
> > > > +module_param(newcode, int, 0644);
> > > > +
> > > >/* Max number of bytes transferred before requeueing the job.
> > > > * Using this limit prevents one virtqueue from starving others. */
> > > >#define VHOST_TEST_WEIGHT 0x8
> > > > @@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n)
> > > > vhost_disable_notify(>dev, vq);
> > > > for (;;) {
> > > > -   head = vhost_get_vq_desc(vq, vq->iov,
> > > > -ARRAY_SIZE(vq->iov),
> > > > -, ,
> > > > -NULL, NULL);
> > > > +   if (newcode)
> > > > +   head = vhost_get_vq_desc_batch(vq, vq->iov,
> > > > +  
> > > > ARRAY_SIZE(vq->iov),
> > > > +  , ,
> > > > +  NULL, NULL);
> > > > +   else
> > > > +   head = vhost_get_vq_desc(vq, vq->iov,
> > > > +ARRAY_SIZE(vq->iov),
> > > > +, ,
> > > > +NULL, NULL);
> > > > /* On error, stop handling until the next kick. */
> > > > if (unlikely(head < 0))
> > > > break;
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 36ca2cf419bf..36661d6cb51f 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
> > > >struct vhost_virtqueue *vq)
> > > >{
> > > > vq->num = 1;
> > > > +   vq->ndescs = 0;
> > > > vq->desc = NULL;
> > > > vq->avail = NULL;
> > > > vq->used = NULL;
> > > > @@ -369,6 +370,9 @@ static int vhost_worker(void *data)
> > > >static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
> > > >{
> > > > +   kfree(vq->descs);
> > > > +   vq->descs = NULL;
> > > > +   vq->max_descs = 0;
> > > > kfree(vq->indirect);
> > > > vq->indirect = NULL;
> > > > kfree(vq->log);
> > > > @@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct 
> > > > vhost_dev *dev)
> > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > vq = dev->vqs[i];
> > > > +   vq->max_descs = dev->iov_limit;
> > > > +   vq->descs = kmalloc_array(vq->max_descs,
> > > > + sizeof(*vq->descs),
> > > > + GFP_KERNEL);
> > > 
> > > Is iov_limit too much here? It can obviously increase the footprint. I 
> > > guess
> > > the batching can only be done for descriptor without indirect or next set.
> > > Then we may batch 16 or 64.
> > > 
> > > Thanks
> > Yes, next patch only batches up to 64.  But we do need iov_limit because
> > guest can pass a long chain of scatter/gather.
> > We already have iovecs in a huge array so this does not look like
> > a big deal. If we ever teach the code to avoid the huge
> > iov arrays by handling huge s/g lists piece by piece,
> > we can make the desc array smaller at the same point.
> > 
> 
> Another possible issue, if we try to batch descriptor chain when we've
> already 

[PATCH 4/5] virtiofs: Count pending forgets as in_flight forgets

2019-10-15 Thread Vivek Goyal
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these
forgets in fsvq->in_flight variable. This means when queue is being drained,
we have to have special logic to first drain these pending requests and
then wait for fsvq->in_flight to go to zero.

By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will
kick and drain all the forgets anyway, leading in_flight to zero.

I also need similar logic for normal request queue in next patch where
I am about to defer request submission in the worker context if queue is
full.

This simplifies the code a bit.

Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.

Signed-off-by: Vivek Goyal 
---
 fs/fuse/virtio_fs.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index e0fcf3030951..625de45fa471 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -67,6 +67,19 @@ static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue 
*vq)
return _to_fsvq(vq)->fud->pq;
 }
 
+/* Should be called with fsvq->lock held. */
+static inline void inc_in_flight_req(struct virtio_fs_vq *fsvq)
+{
+   fsvq->in_flight++;
+}
+
+/* Should be called with fsvq->lock held. */
+static inline void dec_in_flight_req(struct virtio_fs_vq *fsvq)
+{
+   WARN_ON(fsvq->in_flight <= 0);
+   fsvq->in_flight--;
+}
+
 static void release_virtio_fs_obj(struct kref *ref)
 {
struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount);
@@ -110,22 +123,6 @@ static void virtio_fs_drain_queue(struct virtio_fs_vq 
*fsvq)
flush_delayed_work(>dispatch_work);
 }
 
-static inline void drain_hiprio_queued_reqs(struct virtio_fs_vq *fsvq)
-{
-   struct virtio_fs_forget *forget;
-
-   spin_lock(>lock);
-   while (1) {
-   forget = list_first_entry_or_null(>queued_reqs,
-   struct virtio_fs_forget, list);
-   if (!forget)
-   break;
-   list_del(>list);
-   kfree(forget);
-   }
-   spin_unlock(>lock);
-}
-
 static void virtio_fs_drain_all_queues(struct virtio_fs *fs)
 {
struct virtio_fs_vq *fsvq;
@@ -133,9 +130,6 @@ static void virtio_fs_drain_all_queues(struct virtio_fs *fs)
 
for (i = 0; i < fs->nvqs; i++) {
fsvq = >vqs[i];
-   if (i == VQ_HIPRIO)
-   drain_hiprio_queued_reqs(fsvq);
-
virtio_fs_drain_queue(fsvq);
}
 }
@@ -254,7 +248,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct 
*work)
 
while ((req = virtqueue_get_buf(vq, )) != NULL) {
kfree(req);
-   fsvq->in_flight--;
+   dec_in_flight_req(fsvq);
}
} while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq)));
spin_unlock(>lock);
@@ -306,6 +300,7 @@ static void virtio_fs_hiprio_dispatch_work(struct 
work_struct *work)
 
list_del(>list);
if (!fsvq->connected) {
+   dec_in_flight_req(fsvq);
spin_unlock(>lock);
kfree(forget);
continue;
@@ -327,13 +322,13 @@ static void virtio_fs_hiprio_dispatch_work(struct 
work_struct *work)
} else {
pr_debug("virtio-fs: Could not queue FORGET: 
err=%d. Dropping it.\n",
 ret);
+   dec_in_flight_req(fsvq);
kfree(forget);
}
spin_unlock(>lock);
return;
}
 
-   fsvq->in_flight++;
notify = virtqueue_kick_prepare(vq);
spin_unlock(>lock);
 
@@ -472,7 +467,7 @@ static void virtio_fs_requests_done_work(struct work_struct 
*work)
 
fuse_request_end(fc, req);
spin_lock(>lock);
-   fsvq->in_flight--;
+   dec_in_flight_req(fsvq);
spin_unlock(>lock);
}
 }
@@ -730,6 +725,7 @@ __releases(fiq->lock)
list_add_tail(>list, >queued_reqs);
schedule_delayed_work(>dispatch_work,
msecs_to_jiffies(1));
+   inc_in_flight_req(fsvq);
} else {
pr_debug("virtio-fs: Could not queue FORGET: err=%d. 
Dropping it.\n",
 ret);
@@ -739,7 +735,7 @@ __releases(fiq->lock)
goto out;
}
 
-   fsvq->in_flight++;
+   

[PATCH 3/5] virtiofs: Set FR_SENT flag only after request has been sent

2019-10-15 Thread Vivek Goyal
FR_SENT flag should be set when request has been sent successfuly sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.

Also add it to fqp->processing list after sending it successfully.

Signed-off-by: Vivek Goyal 
---
 fs/fuse/virtio_fs.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 3b7f7409e77b..e0fcf3030951 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -857,6 +857,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
unsigned int i;
int ret;
bool notify;
+   struct fuse_pqueue *fpq;
 
/* Does the sglist fit on the stack? */
total_sgs = sg_count_fuse_req(req);
@@ -911,6 +912,15 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
goto out;
}
 
+   /* Request successfuly sent. */
+   fpq = >fud->pq;
+   spin_lock(>lock);
+   list_add_tail(>list, fpq->processing);
+   spin_unlock(>lock);
+   set_bit(FR_SENT, >flags);
+   /* matches barrier in request_wait_answer() */
+   smp_mb__after_atomic();
+
fsvq->in_flight++;
notify = virtqueue_kick_prepare(vq);
 
@@ -939,7 +949,6 @@ __releases(fiq->lock)
struct virtio_fs *fs;
struct fuse_conn *fc;
struct fuse_req *req;
-   struct fuse_pqueue *fpq;
struct virtio_fs_vq *fsvq;
int ret;
 
@@ -958,14 +967,6 @@ __releases(fiq->lock)
 req->in.h.nodeid, req->in.h.len,
 fuse_len_args(req->args->out_numargs, req->args->out_args));
 
-   fpq = >vqs[queue_id].fud->pq;
-   spin_lock(>lock);
-   list_add_tail(>list, fpq->processing);
-   spin_unlock(>lock);
-   set_bit(FR_SENT, >flags);
-   /* matches barrier in request_wait_answer() */
-   smp_mb__after_atomic();
-
 retry:
fsvq = >vqs[queue_id];
ret = virtio_fs_enqueue_req(fsvq, req);
@@ -978,10 +979,6 @@ __releases(fiq->lock)
}
req->out.h.error = ret;
pr_err("virtio-fs: virtio_fs_enqueue_req() failed %d\n", ret);
-   spin_lock(>lock);
-   clear_bit(FR_SENT, >flags);
-   list_del_init(>list);
-   spin_unlock(>lock);
 
/* Can't end request in submission context. Use a worker */
spin_lock(>lock);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/5] virtiofs: Fix couple of deadlocks

2019-10-15 Thread Vivek Goyal
Hi,

We have couple of places which can result in deadlock. This patch series
fixes these.

We can be called with fc->bg_lock (for background requests) while
submitting a request. This leads to two constraints.

- We can't end requests in submitter's context and call fuse_end_request()
  as it tries to take fc->bg_lock as well. So queue these requests on a
  list and use a worker to end these requests.

- If virtqueue is full, we can wait with fc->bg_lock held for queue to
  have space. Worker which is completing the request gets blocked on
  fc->bg_lock as well. And that means requests are not completing, that
  means descriptors are not being freed and that means submitter can't
  make progress. Deadlock. 

  Fix this by punting the requests to a list and retry submission later
  with the help of a worker.

Thanks
Vivek

Vivek Goyal (5):
  virtiofs: Do not end request in submission context
  virtiofs: No need to check fpq->connected state
  virtiofs: Set FR_SENT flag only after request has been sent
  virtiofs: Count pending forgets as in_flight forgets
  virtiofs: Retry request submission from worker context

 fs/fuse/virtio_fs.c | 165 +---
 1 file changed, 111 insertions(+), 54 deletions(-)

-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/5] virtiofs: Do not end request in submission context

2019-10-15 Thread Vivek Goyal
Submission context can hold some locks which end request code tries to
hold again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request,
then deadlock happens. During testing, I also got a warning from deadlock
detection code.

So put requests on a list and end requests from a worker thread.

I got following warning from deadlock detector.

[  603.137138] WARNING: possible recursive locking detected
[  603.137142] 
[  603.137144] blogbench/2036 is trying to acquire lock:
[  603.137149] f0f51107 (&(>bg_lock)->rlock){+.+.}, at: 
fuse_request_end+0xdf/0x1c0 [fuse]
[  603.140701]
[  603.140701] but task is already holding lock:
[  603.140703] f0f51107 (&(>bg_lock)->rlock){+.+.}, at: 
fuse_simple_background+0x92/0x1d0 [fuse]
[  603.140713]
[  603.140713] other info that might help us debug this:
[  603.140714]  Possible unsafe locking scenario:
[  603.140714]
[  603.140715]CPU0
[  603.140716]
[  603.140716]   lock(&(>bg_lock)->rlock);
[  603.140718]   lock(&(>bg_lock)->rlock);
[  603.140719]
[  603.140719]  *** DEADLOCK ***

Signed-off-by: Vivek Goyal 
---
 fs/fuse/virtio_fs.c | 38 ++
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 6af3f131e468..24ac6f8bf3f7 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -30,6 +30,7 @@ struct virtio_fs_vq {
struct virtqueue *vq; /* protected by ->lock */
struct work_struct done_work;
struct list_head queued_reqs;
+   struct list_head end_reqs;  /* End these requests */
struct delayed_work dispatch_work;
struct fuse_dev *fud;
bool connected;
@@ -259,8 +260,27 @@ static void virtio_fs_hiprio_done_work(struct work_struct 
*work)
spin_unlock(>lock);
 }
 
-static void virtio_fs_dummy_dispatch_work(struct work_struct *work)
+static void virtio_fs_request_dispatch_work(struct work_struct *work)
 {
+   struct fuse_req *req;
+   struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
+dispatch_work.work);
+   struct fuse_conn *fc = fsvq->fud->fc;
+
+   pr_debug("virtio-fs: worker %s called.\n", __func__);
+   while (1) {
+   spin_lock(>lock);
+   req = list_first_entry_or_null(>end_reqs, struct fuse_req,
+  list);
+   if (!req) {
+   spin_unlock(>lock);
+   return;
+   }
+
+   list_del_init(>list);
+   spin_unlock(>lock);
+   fuse_request_end(fc, req);
+   }
 }
 
 static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
@@ -502,6 +522,7 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev,
names[VQ_HIPRIO] = fs->vqs[VQ_HIPRIO].name;
INIT_WORK(>vqs[VQ_HIPRIO].done_work, virtio_fs_hiprio_done_work);
INIT_LIST_HEAD(>vqs[VQ_HIPRIO].queued_reqs);
+   INIT_LIST_HEAD(>vqs[VQ_HIPRIO].end_reqs);
INIT_DELAYED_WORK(>vqs[VQ_HIPRIO].dispatch_work,
virtio_fs_hiprio_dispatch_work);
spin_lock_init(>vqs[VQ_HIPRIO].lock);
@@ -511,8 +532,9 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev,
spin_lock_init(>vqs[i].lock);
INIT_WORK(>vqs[i].done_work, virtio_fs_requests_done_work);
INIT_DELAYED_WORK(>vqs[i].dispatch_work,
-   virtio_fs_dummy_dispatch_work);
+ virtio_fs_request_dispatch_work);
INIT_LIST_HEAD(>vqs[i].queued_reqs);
+   INIT_LIST_HEAD(>vqs[i].end_reqs);
snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name),
 "requests.%u", i - VQ_REQUEST);
callbacks[i] = virtio_fs_vq_done;
@@ -918,6 +940,7 @@ __releases(fiq->lock)
struct fuse_conn *fc;
struct fuse_req *req;
struct fuse_pqueue *fpq;
+   struct virtio_fs_vq *fsvq;
int ret;
 
WARN_ON(list_empty(>pending));
@@ -951,7 +974,8 @@ __releases(fiq->lock)
smp_mb__after_atomic();
 
 retry:
-   ret = virtio_fs_enqueue_req(>vqs[queue_id], req);
+   fsvq = >vqs[queue_id];
+   ret = virtio_fs_enqueue_req(fsvq, req);
if (ret < 0) {
if (ret == -ENOMEM || ret == -ENOSPC) {
/* Virtqueue full. Retry submission */
@@ -965,7 +989,13 @@ __releases(fiq->lock)
clear_bit(FR_SENT, >flags);
list_del_init(>list);
spin_unlock(>lock);
-   fuse_request_end(fc, req);
+
+   /* Can't end request in submission context. Use a worker */
+   

[PATCH 5/5] virtiofs: Retry request submission from worker context

2019-10-15 Thread Vivek Goyal
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is
not holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.

This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called fuse_request_end()
which tries to acquire fc->bg_lock and gets blocked. As request completion
thread gets blocked, it does not make further progress and that means queue
does not get empty and submitter can't submit more requests.

To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.

Reported-by: Chirantan Ekbote 
Signed-off-by: Vivek Goyal 
---
 fs/fuse/virtio_fs.c | 59 ++---
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 625de45fa471..58e568ef54ef 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -55,6 +55,9 @@ struct virtio_fs_forget {
struct list_head list;
 };
 
+static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
+struct fuse_req *req, bool in_flight);
+
 static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq)
 {
struct virtio_fs *fs = vq->vdev->priv;
@@ -260,6 +263,7 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
 dispatch_work.work);
struct fuse_conn *fc = fsvq->fud->fc;
+   int ret;
 
pr_debug("virtio-fs: worker %s called.\n", __func__);
while (1) {
@@ -268,13 +272,43 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
   list);
if (!req) {
spin_unlock(>lock);
-   return;
+   break;
}
 
list_del_init(>list);
spin_unlock(>lock);
fuse_request_end(fc, req);
}
+
+   /* Dispatch pending requests */
+   while (1) {
+   spin_lock(>lock);
+   req = list_first_entry_or_null(>queued_reqs,
+  struct fuse_req, list);
+   if (!req) {
+   spin_unlock(>lock);
+   return;
+   }
+   list_del_init(>list);
+   spin_unlock(>lock);
+
+   ret = virtio_fs_enqueue_req(fsvq, req, true);
+   if (ret < 0) {
+   if (ret == -ENOMEM || ret == -ENOSPC) {
+   spin_lock(>lock);
+   list_add_tail(>list, >queued_reqs);
+   schedule_delayed_work(>dispatch_work,
+ msecs_to_jiffies(1));
+   spin_unlock(>lock);
+   return;
+   }
+   req->out.h.error = ret;
+   dec_in_flight_req(fsvq);
+   pr_err("virtio-fs: virtio_fs_enqueue_req() failed %d\n",
+  ret);
+   fuse_request_end(fc, req);
+   }
+   }
 }
 
 static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
@@ -837,7 +871,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist 
*sg,
 
 /* Add a request to a virtqueue and kick the device */
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
-struct fuse_req *req)
+struct fuse_req *req, bool in_flight)
 {
/* requests need at least 4 elements */
struct scatterlist *stack_sgs[6];
@@ -917,7 +951,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
/* matches barrier in request_wait_answer() */
smp_mb__after_atomic();
 
-   inc_in_flight_req(fsvq);
+   if (!in_flight)
+   inc_in_flight_req(fsvq);
notify = virtqueue_kick_prepare(vq);
 
spin_unlock(>lock);
@@ -963,15 +998,21 @@ __releases(fiq->lock)
 req->in.h.nodeid, req->in.h.len,
 fuse_len_args(req->args->out_numargs, req->args->out_args));
 
-retry:
fsvq = >vqs[queue_id];
-   ret = virtio_fs_enqueue_req(fsvq, req);
+   ret = virtio_fs_enqueue_req(fsvq, req, false);
if (ret < 0) {
if (ret == -ENOMEM || ret == -ENOSPC) {
-   /* Virtqueue full. Retry submission */
-   /* TODO use completion instead of timeout */
-   usleep_range(20, 30);
-   goto retry;
+   

[PATCH 2/5] virtiofs: No need to check fpq->connected state

2019-10-15 Thread Vivek Goyal
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.

We probably need to merge these two at some point of time. For now, simplify
the code a bit and do not worry about checking state of fpq->connected.

Signed-off-by: Vivek Goyal 
---
 fs/fuse/virtio_fs.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 24ac6f8bf3f7..3b7f7409e77b 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -960,13 +960,6 @@ __releases(fiq->lock)
 
fpq = >vqs[queue_id].fud->pq;
spin_lock(>lock);
-   if (!fpq->connected) {
-   spin_unlock(>lock);
-   req->out.h.error = -ENODEV;
-   pr_err("virtio-fs: %s disconnected\n", __func__);
-   fuse_request_end(fc, req);
-   return;
-   }
list_add_tail(>list, fpq->processing);
spin_unlock(>lock);
set_bit(FR_SENT, >flags);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-15 Thread Alex Williamson
On Tue, 15 Oct 2019 20:17:01 +0800
Jason Wang  wrote:

> On 2019/10/15 下午6:41, Cornelia Huck wrote:
> > On Fri, 11 Oct 2019 16:15:54 +0800
> > Jason Wang  wrote:
> >  
> >> Currently, except for the create and remove, the rest of
> >> mdev_parent_ops is designed for vfio-mdev driver only and may not help
> >> for kernel mdev driver. With the help of class id, this patch
> >> introduces device specific callbacks inside mdev_device
> >> structure. This allows different set of callback to be used by
> >> vfio-mdev and virtio-mdev.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   .../driver-api/vfio-mediated-device.rst   | 22 +---
> >>   MAINTAINERS   |  1 +
> >>   drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
> >>   drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
> >>   drivers/vfio/mdev/mdev_core.c |  9 +++-
> >>   drivers/vfio/mdev/mdev_private.h  |  1 +
> >>   drivers/vfio/mdev/vfio_mdev.c | 37 ++---
> >>   include/linux/mdev.h  | 42 +++
> >>   include/linux/vfio_mdev.h | 52 +++
> >>   samples/vfio-mdev/mbochs.c| 20 ---
> >>   samples/vfio-mdev/mdpy.c  | 21 +---
> >>   samples/vfio-mdev/mtty.c  | 18 ---
> >>   13 files changed, 177 insertions(+), 96 deletions(-)
> >>   create mode 100644 include/linux/vfio_mdev.h
> >>
> >> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> >> b/Documentation/driver-api/vfio-mediated-device.rst
> >> index 2035e48da7b2..553574ebba73 100644
> >> --- a/Documentation/driver-api/vfio-mediated-device.rst
> >> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> >> @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or 
> >> any other categorization.
> >>   Vendor drivers are expected to be fully asynchronous in this respect or
> >>   provide their own internal resource protection.)
> >>   
> >> -The callbacks in the mdev_parent_ops structure are as follows:
> >> +In order to support multiple types of device/driver, device needs to
> >> +provide both class_id and device_ops through:  
> > "As multiple types of mediated devices may be supported, the device
> > needs to set up the class id and the device specific callbacks via:"
> >
> > ?
> >  
> >>   
> >> -* open: open callback of mediated device
> >> -* close: close callback of mediated device
> >> -* ioctl: ioctl callback of mediated device
> >> +void mdev_set_class(struct mdev_device *mdev, u16 id, const void 
> >> *ops);
> >> +
> >> +The class_id is used to be paired with ids in id_table in mdev_driver
> >> +structure for probing the correct driver.  
> > "The class id  (specified in id) is used to match a device with an mdev
> > driver via its id table."
> >
> > ?
> >  
> >> The device_ops is device
> >> +specific callbacks which can be get through mdev_get_dev_ops()
> >> +function by mdev bus driver.  
> > "The device specific callbacks (specified in *ops) are obtainable via
> > mdev_get_dev_ops() (for use by the mdev bus driver.)"
> >
> > ?
> >  
> >> For vfio-mdev device, its device specific
> >> +ops are as follows:  
> > "A vfio-mdev device (class id MDEV_ID_VFIO) uses the following
> > device-specific ops:"
> >
> > ?  
> 
> 
> All you propose is better than what I wrote, will change the docs.
> 
> 
> >  
> >> +
> >> +* open: open callback of vfio mediated device
> >> +* close: close callback of vfio mediated device
> >> +* ioctl: ioctl callback of vfio mediated device
> >>   * read : read emulation callback
> >>   * write: write emulation callback
> >>   * mmap: mmap emulation callback
> >> @@ -167,9 +176,10 @@ register itself with the mdev core driver::
> >>extern int  mdev_register_device(struct device *dev,
> >> const struct mdev_parent_ops *ops);
> >>   
> >> -It is also required to specify the class_id through::
> >> +It is also required to specify the class_id and device specific ops 
> >> through::
> >>   
> >> -  extern int mdev_set_class(struct device *dev, u16 id);
> >> +  extern int mdev_set_class(struct device *dev, u16 id,
> >> +const void *ops);  
> > Apologies if that has already been discussed, but do we want a 1:1
> > relationship between id and ops, or can different devices with the same
> > id register different ops?  
> 
> 
> I think we have a N:1 mapping between id and ops, e.g we want both 
> virtio-mdev and vhost-mdev use a single set of device ops.

The contents of the ops structure is essentially defined by the id,
which is why I was leaning towards them being defined together.  They
are effectively interlocked, the id defines which mdev "endpoint"
driver is loaded and that driver requires mdev_get_dev_ops() to return
the structure required by the driver.  I wish 

Re: [PATCH V3 1/7] mdev: class id support

2019-10-15 Thread Alex Williamson
On Fri, 11 Oct 2019 16:15:51 +0800
Jason Wang  wrote:
  
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4cfd082..724e9b9841d8 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
>  }
>  EXPORT_SYMBOL(mdev_set_drvdata);
>  
> +void mdev_set_class(struct mdev_device *mdev, u16 id)
> +{
> + mdev->class_id = id;
> +}
> +EXPORT_SYMBOL(mdev_set_class);
> +
>  struct device *mdev_dev(struct mdev_device *mdev)
>  {
>   return >dev;
> @@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void 
> *data)
>   * mdev_register_device : Register a device
>   * @dev: device structure representing parent device.
>   * @ops: Parent device operation structure to be registered.
> + * @id: class id.
>   *
>   * Add device to list of registered parent devices.
>   * Returns a negative value on error, otherwise 0.
> @@ -324,6 +331,9 @@ int mdev_device_create(struct kobject *kobj,
>   if (ret)
>   goto ops_create_fail;
>  
> + if (!mdev->class_id)

This is a sanity test failure of the parent driver on a privileged
path, I think it's fair to print a warning when this occurs rather than
only return an errno to the user.  In fact, ret is not set to an error
value here, so it looks like this fails to create the device but
returns success.  Thanks,

Alex

> + goto class_id_fail;
> +
>   ret = device_add(>dev);
>   if (ret)
>   goto add_fail;
> @@ -340,6 +350,7 @@ int mdev_device_create(struct kobject *kobj,
>  
>  sysfs_fail:
>   device_del(>dev);
> +class_id_fail:
>  add_fail:
>   parent->ops->remove(mdev);
>  ops_create_fail:
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'

2019-10-15 Thread Stefano Garzarella
The 'work' field was introduced with commit 06a8fc78367d0
("VSOCK: Introduce virtio_vsock_common.ko")
but it is never used in the code, so we can remove it to save
memory allocated in the per-packet 'struct virtio_vsock_pkt'

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
 include/linux/virtio_vsock.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 4c7781f4b29b..07875ccc7bb5 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -48,7 +48,6 @@ struct virtio_vsock_sock {
 
 struct virtio_vsock_pkt {
struct virtio_vsock_hdr hdr;
-   struct work_struct work;
struct list_head list;
/* socket refcnt not held, only use for cancellation */
struct vsock_sock *vsk;
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


0-copy (was Re: question: asynchronous notification from vhost)

2019-10-15 Thread Guennadi Liakhovetski
On Tue, Oct 15, 2019 at 11:23:13AM +0200, Guennadi Liakhovetski wrote:
> Hi,
> 
> I'm developing a virtualised audio / DSP virtio and vhost driver pair 
> and I'm currently somewhat stuck trying to figure out how to 
> asynchronously notify the guest from the vhost driver. I'm using the 
> vhost_add_used_and_signal() function to return data back to the guest 
> in the guest context, when the guest initiated an operation, that's 
> working well. But how do I "kick" the guest from an asynchronous, e.g. 
> from an interrupt thread context?

I think I've solved that by using a vhost work queue. That brings me 
one step further, possibly, to the last larger problem to solve: the 
actual data transmission. My preference would be to use zero-copy 
for that, but so far I only see one example of zero-copy in 
drivers/vhost - in net.c and only in TX direction. As far as I 
understand that isn't a principal limitation of the vhost / virtio 
API, rather it's related to certain networking specifics. Am I 
right? Is it supposed to work in both firections, or are there 
problems?

Thanks
Guennadi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-15 Thread Stefan Hajnoczi
On Tue, Oct 15, 2019 at 11:37:17AM +0800, Jason Wang wrote:
> 
> On 2019/10/15 上午1:49, Stefan Hajnoczi wrote:
> > On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:
> > > There are hardware that can do virtio datapath offloading while having
> > > its own control path. This path tries to implement a mdev based
> > > unified API to support using kernel virtio driver to drive those
> > > devices. This is done by introducing a new mdev transport for virtio
> > > (virtio_mdev) and register itself as a new kind of mdev driver. Then
> > > it provides a unified way for kernel virtio driver to talk with mdev
> > > device implementation.
> > > 
> > > Though the series only contains kernel driver support, the goal is to
> > > make the transport generic enough to support userspace drivers. This
> > > means vhost-mdev[1] could be built on top as well by resuing the
> > > transport.
> > > 
> > > A sample driver is also implemented which simulate a virito-net
> > > loopback ethernet device on top of vringh + workqueue. This could be
> > > used as a reference implementation for real hardware driver.
> > > 
> > > Consider mdev framework only support VFIO device and driver right now,
> > > this series also extend it to support other types. This is done
> > > through introducing class id to the device and pairing it with
> > > id_talbe claimed by the driver. On top, this seris also decouple
> > > device specific parents ops out of the common ones.
> > I was curious so I took a quick look and posted comments.
> > 
> > I guess this driver runs inside the guest since it registers virtio
> > devices?
> 
> 
> It could run in either guest or host. But the main focus is to run in the
> host then we can use virtio drivers in containers.
> 
> 
> > 
> > If this is used with physical PCI devices that support datapath
> > offloading then how are physical devices presented to the guest without
> > SR-IOV?
> 
> 
> We will do control path meditation through vhost-mdev[1] and vhost-vfio[2].
> Then we will present a full virtio compatible ethernet device for guest.
> 
> SR-IOV is not a must, any mdev device that implements the API defined in
> patch 5 can be used by this framework.

What I'm trying to understand is: if you want to present a virtio-pci
device to the guest (e.g. using vhost-mdev or vhost-vfio), then how is
that related to this patch series?

Does this mean this patch series is useful mostly for presenting virtio
devices to containers or the host?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-15 Thread Jason Wang


On 2019/10/15 下午6:41, Cornelia Huck wrote:

On Fri, 11 Oct 2019 16:15:54 +0800
Jason Wang  wrote:


Currently, except for the create and remove, the rest of
mdev_parent_ops is designed for vfio-mdev driver only and may not help
for kernel mdev driver. With the help of class id, this patch
introduces device specific callbacks inside mdev_device
structure. This allows different set of callback to be used by
vfio-mdev and virtio-mdev.

Signed-off-by: Jason Wang 
---
  .../driver-api/vfio-mediated-device.rst   | 22 +---
  MAINTAINERS   |  1 +
  drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
  drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
  drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
  drivers/vfio/mdev/mdev_core.c |  9 +++-
  drivers/vfio/mdev/mdev_private.h  |  1 +
  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
  include/linux/mdev.h  | 42 +++
  include/linux/vfio_mdev.h | 52 +++
  samples/vfio-mdev/mbochs.c| 20 ---
  samples/vfio-mdev/mdpy.c  | 21 +---
  samples/vfio-mdev/mtty.c  | 18 ---
  13 files changed, 177 insertions(+), 96 deletions(-)
  create mode 100644 include/linux/vfio_mdev.h

diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
b/Documentation/driver-api/vfio-mediated-device.rst
index 2035e48da7b2..553574ebba73 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or any 
other categorization.
  Vendor drivers are expected to be fully asynchronous in this respect or
  provide their own internal resource protection.)
  
-The callbacks in the mdev_parent_ops structure are as follows:

+In order to support multiple types of device/driver, device needs to
+provide both class_id and device_ops through:

"As multiple types of mediated devices may be supported, the device
needs to set up the class id and the device specific callbacks via:"

?

  
-* open: open callback of mediated device

-* close: close callback of mediated device
-* ioctl: ioctl callback of mediated device
+void mdev_set_class(struct mdev_device *mdev, u16 id, const void *ops);
+
+The class_id is used to be paired with ids in id_table in mdev_driver
+structure for probing the correct driver.

"The class id  (specified in id) is used to match a device with an mdev
driver via its id table."

?


The device_ops is device
+specific callbacks which can be get through mdev_get_dev_ops()
+function by mdev bus driver.

"The device specific callbacks (specified in *ops) are obtainable via
mdev_get_dev_ops() (for use by the mdev bus driver.)"

?


For vfio-mdev device, its device specific
+ops are as follows:

"A vfio-mdev device (class id MDEV_ID_VFIO) uses the following
device-specific ops:"

?



All you propose is better than what I wrote, will change the docs.





+
+* open: open callback of vfio mediated device
+* close: close callback of vfio mediated device
+* ioctl: ioctl callback of vfio mediated device
  * read : read emulation callback
  * write: write emulation callback
  * mmap: mmap emulation callback
@@ -167,9 +176,10 @@ register itself with the mdev core driver::
extern int  mdev_register_device(struct device *dev,
 const struct mdev_parent_ops *ops);
  
-It is also required to specify the class_id through::

+It is also required to specify the class_id and device specific ops through::
  
-	extern int mdev_set_class(struct device *dev, u16 id);

+   extern int mdev_set_class(struct device *dev, u16 id,
+ const void *ops);

Apologies if that has already been discussed, but do we want a 1:1
relationship between id and ops, or can different devices with the same
id register different ops?



I think we have a N:1 mapping between id and ops, e.g we want both 
virtio-mdev and vhost-mdev use a single set of device ops.


Thanks



If the former, would it make sense to first
register the ops for an id (once), and then have the ->create callback
only set the class id (with the core doing the lookup of the ops)?

  
  However, the mdev_parent_ops structure is not required in the function call

  that a driver should use to unregister itself with the mdev core driver::

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 2/7] mdev: bus uevent support

2019-10-15 Thread Jason Wang


On 2019/10/15 下午6:27, Cornelia Huck wrote:

On Fri, 11 Oct 2019 16:15:52 +0800
Jason Wang  wrote:


This patch adds bus uevent support for mdev bus in order to allow
cooperation with userspace.

Signed-off-by: Jason Wang 
---
  drivers/vfio/mdev/mdev_driver.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
index b7c40ce86ee3..319d886ffaf7 100644
--- a/drivers/vfio/mdev/mdev_driver.c
+++ b/drivers/vfio/mdev/mdev_driver.c
@@ -82,9 +82,17 @@ static int mdev_match(struct device *dev, struct 
device_driver *drv)
return 0;
  }
  
+static int mdev_uevent(struct device *dev, struct kobj_uevent_env *env)

+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   return add_uevent_var(env, "MODALIAS=mdev:c%02X", mdev->class_id);
+}
+
  struct bus_type mdev_bus_type = {
.name   = "mdev",
.match  = mdev_match,
+   .uevent = mdev_uevent,
.probe  = mdev_probe,
.remove = mdev_remove,
  };

I'd merge that into the previous patch.



Will do.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 1/7] mdev: class id support

2019-10-15 Thread Jason Wang


On 2019/10/15 下午6:26, Cornelia Huck wrote:

On Fri, 11 Oct 2019 16:15:51 +0800
Jason Wang  wrote:


Mdev bus only supports vfio driver right now, so it doesn't implement
match method. But in the future, we may add drivers other than vfio,
the first driver could be virtio-mdev. This means we need to add
device class id support in bus match method to pair the mdev device
and mdev driver correctly.

So this patch adds id_table to mdev_driver and class_id for mdev
device with the match method for mdev bus.

Signed-off-by: Jason Wang 
---
  Documentation/driver-api/vfio-mediated-device.rst |  7 ++-
  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
  drivers/s390/crypto/vfio_ap_ops.c |  1 +
  drivers/vfio/mdev/mdev_core.c | 11 +++
  drivers/vfio/mdev/mdev_driver.c   | 14 ++
  drivers/vfio/mdev/mdev_private.h  |  1 +
  drivers/vfio/mdev/vfio_mdev.c |  6 ++
  include/linux/mdev.h  |  8 
  include/linux/mod_devicetable.h   |  8 
  samples/vfio-mdev/mbochs.c|  1 +
  samples/vfio-mdev/mdpy.c  |  1 +
  samples/vfio-mdev/mtty.c  |  1 +
  13 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..2035e48da7b2 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
* @probe: called when new device created
* @remove: called when device removed
* @driver: device driver structure
+  * @id_table: the ids serviced by this driver
*/
   struct mdev_driver {
 const char *name;
 int  (*probe)  (struct device *dev);
 void (*remove) (struct device *dev);
 struct device_driverdriver;
+const struct mdev_class_id *id_table;
   };
  
  A mediated bus driver for mdev should use this structure in the function calls

@@ -165,12 +167,15 @@ register itself with the mdev core driver::
extern int  mdev_register_device(struct device *dev,
 const struct mdev_parent_ops *ops);
  
+It is also required to specify the class_id through::

+
+   extern int mdev_set_class(struct device *dev, u16 id);

Should the document state explicitly that this should be done in the
->create() callback?



Yes, it's better.



Also, I think that the class_id might be different
for different mdevs (even if the parent is the same) -- should that be
mentioned explicitly?



Yes, depends on the mdev_supported_type.

Thanks





+
  However, the mdev_parent_ops structure is not required in the function call
  that a driver should use to unregister itself with the mdev core driver::
  
  	extern void mdev_unregister_device(struct device *dev);
  
-

  Mediated Device Management Interface Through sysfs
  ==
  

(...)

Looks reasonable to me.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports

2019-10-15 Thread Stefan Hajnoczi
On Sat, Oct 12, 2019 at 06:38:46PM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 04:34:57PM +0200, Stefano Garzarella wrote:
> > On Fri, Oct 11, 2019 at 10:19:13AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote:
> > > > We are implementing a test suite for the VSOCK sockets and we discovered
> > > > that vmci_transport never allowed half-closed socket on the host side.
> > > > 
> > > > As Jorgen explained [1] this is due to the implementation of VMCI.
> > > > 
> > > > Since we want to have the same behaviour across all transports, this
> > > > series adds a section in the "Implementation notes" to exaplain this
> > > > behaviour, and changes the vhost_transport to behave the same way.
> > > > 
> > > > [1] https://patchwork.ozlabs.org/cover/847998/#1831400
> > > 
> > > Half closed sockets are very useful, and lots of
> > > applications use tricks to swap a vsock for a tcp socket,
> > > which might as a result break.
> > 
> > Got it!
> > 
> > > 
> > > If VMCI really cares it can implement an ioctl to
> > > allow applications to detect that half closed sockets aren't supported.
> > > 
> > > It does not look like VMCI wants to bother (users do not read
> > > kernel implementation notes) so it does not really care.
> > > So why do we want to cripple other transports intentionally?
> > 
> > The main reason is that we are developing the test suite and we noticed
> > the miss match. Since we want to make sure that applications behave in
> > the same way on different transports, we thought we would solve it that
> > way.
> > 
> > But what you are saying (also in the reply of the patches) is actually
> > quite right. Not being publicized, applications do not expect this behavior,
> > so please discard this series.
> > 
> > My problem during the tests, was trying to figure out if half-closed
> > sockets were supported or not, so as you say adding an IOCTL or maybe
> > better a getsockopt() could solve the problem.
> > 
> > What do you think?
> > 
> > Thanks,
> > Stefano
> 
> Sure, why not.

The aim is for applications using AF_VSOCK sockets to run on any
transport.  When the semantics differ between transports it creates a
compatibility problem.

That said, I do think keeping the standard sockets behavior is
reasonable.  If applications have problems on VMCI a sockopt may be
necessary :(.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 4/7] mdev: introduce device specific ops

2019-10-15 Thread Cornelia Huck
On Fri, 11 Oct 2019 16:15:54 +0800
Jason Wang  wrote:

> Currently, except for the create and remove, the rest of
> mdev_parent_ops is designed for vfio-mdev driver only and may not help
> for kernel mdev driver. With the help of class id, this patch
> introduces device specific callbacks inside mdev_device
> structure. This allows different set of callback to be used by
> vfio-mdev and virtio-mdev.
> 
> Signed-off-by: Jason Wang 
> ---
>  .../driver-api/vfio-mediated-device.rst   | 22 +---
>  MAINTAINERS   |  1 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
>  drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
>  drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
>  drivers/vfio/mdev/mdev_core.c |  9 +++-
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c | 37 ++---
>  include/linux/mdev.h  | 42 +++
>  include/linux/vfio_mdev.h | 52 +++
>  samples/vfio-mdev/mbochs.c| 20 ---
>  samples/vfio-mdev/mdpy.c  | 21 +---
>  samples/vfio-mdev/mtty.c  | 18 ---
>  13 files changed, 177 insertions(+), 96 deletions(-)
>  create mode 100644 include/linux/vfio_mdev.h
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 2035e48da7b2..553574ebba73 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or any 
> other categorization.
>  Vendor drivers are expected to be fully asynchronous in this respect or
>  provide their own internal resource protection.)
>  
> -The callbacks in the mdev_parent_ops structure are as follows:
> +In order to support multiple types of device/driver, device needs to
> +provide both class_id and device_ops through:

"As multiple types of mediated devices may be supported, the device
needs to set up the class id and the device specific callbacks via:"

?

>  
> -* open: open callback of mediated device
> -* close: close callback of mediated device
> -* ioctl: ioctl callback of mediated device
> +void mdev_set_class(struct mdev_device *mdev, u16 id, const void *ops);
> +
> +The class_id is used to be paired with ids in id_table in mdev_driver
> +structure for probing the correct driver.

"The class id  (specified in id) is used to match a device with an mdev
driver via its id table."

?

> The device_ops is device
> +specific callbacks which can be get through mdev_get_dev_ops()
> +function by mdev bus driver. 

"The device specific callbacks (specified in *ops) are obtainable via
mdev_get_dev_ops() (for use by the mdev bus driver.)"

?

> For vfio-mdev device, its device specific
> +ops are as follows:

"A vfio-mdev device (class id MDEV_ID_VFIO) uses the following
device-specific ops:"

?

> +
> +* open: open callback of vfio mediated device
> +* close: close callback of vfio mediated device
> +* ioctl: ioctl callback of vfio mediated device
>  * read : read emulation callback
>  * write: write emulation callback
>  * mmap: mmap emulation callback
> @@ -167,9 +176,10 @@ register itself with the mdev core driver::
>   extern int  mdev_register_device(struct device *dev,
>const struct mdev_parent_ops *ops);
>  
> -It is also required to specify the class_id through::
> +It is also required to specify the class_id and device specific ops through::
>  
> - extern int mdev_set_class(struct device *dev, u16 id);
> + extern int mdev_set_class(struct device *dev, u16 id,
> +   const void *ops);

Apologies if that has already been discussed, but do we want a 1:1
relationship between id and ops, or can different devices with the same
id register different ops? If the former, would it make sense to first
register the ops for an id (once), and then have the ->create callback
only set the class id (with the core doing the lookup of the ops)?

>  
>  However, the mdev_parent_ops structure is not required in the function call
>  that a driver should use to unregister itself with the mdev core driver::
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 2/7] mdev: bus uevent support

2019-10-15 Thread Cornelia Huck
On Fri, 11 Oct 2019 16:15:52 +0800
Jason Wang  wrote:

> This patch adds bus uevent support for mdev bus in order to allow
> cooperation with userspace.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/mdev_driver.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
> index b7c40ce86ee3..319d886ffaf7 100644
> --- a/drivers/vfio/mdev/mdev_driver.c
> +++ b/drivers/vfio/mdev/mdev_driver.c
> @@ -82,9 +82,17 @@ static int mdev_match(struct device *dev, struct 
> device_driver *drv)
>   return 0;
>  }
>  
> +static int mdev_uevent(struct device *dev, struct kobj_uevent_env *env)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return add_uevent_var(env, "MODALIAS=mdev:c%02X", mdev->class_id);
> +}
> +
>  struct bus_type mdev_bus_type = {
>   .name   = "mdev",
>   .match  = mdev_match,
> + .uevent = mdev_uevent,
>   .probe  = mdev_probe,
>   .remove = mdev_remove,
>  };

I'd merge that into the previous patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 1/7] mdev: class id support

2019-10-15 Thread Cornelia Huck
On Fri, 11 Oct 2019 16:15:51 +0800
Jason Wang  wrote:

> Mdev bus only supports vfio driver right now, so it doesn't implement
> match method. But in the future, we may add drivers other than vfio,
> the first driver could be virtio-mdev. This means we need to add
> device class id support in bus match method to pair the mdev device
> and mdev driver correctly.
> 
> So this patch adds id_table to mdev_driver and class_id for mdev
> device with the match method for mdev bus.
> 
> Signed-off-by: Jason Wang 
> ---
>  Documentation/driver-api/vfio-mediated-device.rst |  7 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/mdev/mdev_core.c | 11 +++
>  drivers/vfio/mdev/mdev_driver.c   | 14 ++
>  drivers/vfio/mdev/mdev_private.h  |  1 +
>  drivers/vfio/mdev/vfio_mdev.c |  6 ++
>  include/linux/mdev.h  |  8 
>  include/linux/mod_devicetable.h   |  8 
>  samples/vfio-mdev/mbochs.c|  1 +
>  samples/vfio-mdev/mdpy.c  |  1 +
>  samples/vfio-mdev/mtty.c  |  1 +
>  13 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
> b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..2035e48da7b2 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
>* @probe: called when new device created
>* @remove: called when device removed
>* @driver: device driver structure
> +  * @id_table: the ids serviced by this driver
>*/
>   struct mdev_driver {
>const char *name;
>int  (*probe)  (struct device *dev);
>void (*remove) (struct device *dev);
>struct device_driverdriver;
> +  const struct mdev_class_id *id_table;
>   };
>  
>  A mediated bus driver for mdev should use this structure in the function 
> calls
> @@ -165,12 +167,15 @@ register itself with the mdev core driver::
>   extern int  mdev_register_device(struct device *dev,
>const struct mdev_parent_ops *ops);
>  
> +It is also required to specify the class_id through::
> +
> + extern int mdev_set_class(struct device *dev, u16 id);

Should the document state explicitly that this should be done in the
->create() callback? Also, I think that the class_id might be different
for different mdevs (even if the parent is the same) -- should that be
mentioned explicitly?

> +
>  However, the mdev_parent_ops structure is not required in the function call
>  that a driver should use to unregister itself with the mdev core driver::
>  
>   extern void mdev_unregister_device(struct device *dev);
>  
> -
>  Mediated Device Management Interface Through sysfs
>  ==
>  
(...)

Looks reasonable to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


question: asynchronous notification from vhost

2019-10-15 Thread Guennadi Liakhovetski
Hi,

I'm developing a virtualised audio / DSP virtio and vhost driver pair 
and I'm currently somewhat stuck trying to figure out how to 
asynchronously notify the guest from the vhost driver. I'm using the 
vhost_add_used_and_signal() function to return data back to the guest 
in the guest context, when the guest initiated an operation, that's 
working well. But how do I "kick" the guest from an asynchronous, e.g. 
from an interrupt thread context?

Thanks
Guennadi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted

2019-10-15 Thread Christoph Hellwig
On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote:
> From: Thiago Jung Bauermann 
> 
> Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must
> be set by both device and guest driver. However, as a hack, when DMA API
> returns physical addresses, guest driver can use the DMA API; even though
> device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical
> addresses.

Sorry, but this is a complete bullshit hack.  Driver must always use
the DMA API if they do DMA, and if virtio devices use physical addresses
that needs to be returned through the platform firmware interfaces for
the dma setup.  If you don't do that yet (which based on previous
informations you don't), you need to fix it, and we can then quirk
old implementations that already are out in the field.

In other words: we finally need to fix that virtio mess and not pile
hacks on top of hacks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()

2019-10-15 Thread Christoph Hellwig
On Mon, Oct 14, 2019 at 11:29:24AM +0100, Robin Murphy wrote:
>> However, I would like to see the commit message (and maybe the inline
>> comments) expanded a bit on what the distinction here is about.  Some
>> of the text from the next patch would be suitable, about DMA addresses
>> usually being in a different address space but not in the case of
>> bounce buffering.
>
> Right, this needs a much tighter definition. "DMA address happens to be a 
> valid physical address" is true of various IOMMU setups too, but I can't 
> believe it's meaningful in such cases.
>
> If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA address 
> is physical address of DMA data (not necessarily the original buffer)" - 
> wouldn't dma_is_direct() suffice?

It would.  But drivers have absolutely no business knowing any of this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()

2019-10-15 Thread Ram Pai
On Mon, Oct 14, 2019 at 11:29:24AM +0100, Robin Murphy wrote:
> On 14/10/2019 05:51, David Gibson wrote:
> >On Fri, Oct 11, 2019 at 06:25:18PM -0700, Ram Pai wrote:
> >>From: Thiago Jung Bauermann 
> >>
> >>In order to safely use the DMA API, virtio needs to know whether DMA
> >>addresses are in fact physical addresses and for that purpose,
> >>dma_addr_is_phys_addr() is introduced.
> >>
> >>cc: Benjamin Herrenschmidt 
> >>cc: David Gibson 
> >>cc: Michael Ellerman 
> >>cc: Paul Mackerras 
> >>cc: Michael Roth 
> >>cc: Alexey Kardashevskiy 
> >>cc: Paul Burton 
> >>cc: Robin Murphy 
> >>cc: Bartlomiej Zolnierkiewicz 
> >>cc: Marek Szyprowski 
> >>cc: Christoph Hellwig 
> >>Suggested-by: Michael S. Tsirkin 
> >>Signed-off-by: Ram Pai 
> >>Signed-off-by: Thiago Jung Bauermann 
> >
> >The change itself looks ok, so
> >
> >Reviewed-by: David Gibson 
> >
> >However, I would like to see the commit message (and maybe the inline
> >comments) expanded a bit on what the distinction here is about.  Some
> >of the text from the next patch would be suitable, about DMA addresses
> >usually being in a different address space but not in the case of
> >bounce buffering.
> 
> Right, this needs a much tighter definition. "DMA address happens to
> be a valid physical address" is true of various IOMMU setups too,
> but I can't believe it's meaningful in such cases.

The definition by itself is meaningful AFAICT. At its core its just
indicating whether DMA addresses are physical addresses or not.

However its up to the caller to use it meaningfully. For non-virtio caller,
dma_addr_is_phys_addr() by itself may or may not be meaningful.

But for a virtio caller, this information when paired with
mem_encrypt_active() is meaningful.

For IOMMU setups DMA API will get used regardless of "DMA address
happens to be a valid physical address"


> 
> If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA
> address is physical address of DMA data (not necessarily the
> original buffer)" - wouldn't dma_is_direct() suffice?

This may also work, I think.  MST: thoughts?

RP

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization