Re: [PATCH v1 0/2] Virtio: fix some vq allocation issues

2019-01-02 Thread Wei Wang

On 12/28/2018 03:57 PM, Christian Borntraeger wrote:


On 28.12.2018 03:26, Wei Wang wrote:

Some vqs don't need to be allocated when the related feature bits are
disabled. Callers notice the vq allocation layer by setting the related
names[i] to be NULL.

This patch series fixes the find_vqs implementations to handle this case.

So the random crashes during boot are gone.
What still does not work is actually using the balloon.

So in the qemu monitor using lets say "balloon 1000"  will hang the guest.
Seems to be a deadlock in the virtio-ccw code.  We seem to call the
config code in the interrupt handler.


Please try with applying both this series and the "virtio-balloon: tweak 
config_changed"

series that I just sent.
This series fixes the ccw booting issue and that series tries to fix the 
ccw deadlock issue.


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 2/2] virtio-balloon: improve update_balloon_size_func

2019-01-02 Thread Wei Wang
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9a82a11..4884b85 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -435,9 +435,12 @@ static void update_balloon_size_func(struct work_struct 
*work)
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (!diff)
+   return;
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
-   else if (diff < 0)
+   else
diff += leak_balloon(vb, -diff);
update_balloon_size(vb);
 
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 1/2] virtio-balloon: tweak config_changed implementation

2019-01-02 Thread Wei Wang
virtio-ccw has deadlock issues with reading config registers inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.

Signed-off-by: Wei Wang 
---
 drivers/virtio/virtio_balloon.c | 54 -
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 728ecd1..9a82a11 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -394,33 +394,15 @@ static void virtballoon_changed(struct virtio_device 
*vdev)
 {
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
-   s64 diff = towards_target(vb);
-
-   if (diff) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq,
-  >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
 
-   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-   virtio_cread(vdev, struct virtio_balloon_config,
-free_page_report_cmd_id, >cmd_id_received);
-   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
-   /* Pass ULONG_MAX to give back all the free pages */
-   return_free_pages_to_mm(vb, ULONG_MAX);
-   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
-  vb->cmd_id_received !=
-  virtio32_to_cpu(vdev, vb->cmd_id_active)) {
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update) {
-   queue_work(vb->balloon_wq,
-  >report_free_page_work);
-   }
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-   }
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update) {
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+   queue_work(vb->balloon_wq, >report_free_page_work);
}
+   spin_unlock_irqrestore(>stop_update_lock, flags);
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
@@ -637,11 +619,9 @@ static int send_free_pages(struct virtio_balloon *vb)
return 0;
 }
 
-static void report_free_page_func(struct work_struct *work)
+static void virtio_balloon_report_free_page(struct virtio_balloon *vb)
 {
int err;
-   struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
-report_free_page_work);
struct device *dev = >vdev->dev;
 
/* Start by sending the received cmd id to host with an outbuf. */
@@ -659,6 +639,24 @@ static void report_free_page_func(struct work_struct *work)
dev_err(dev, "Failed to send a stop id, err = %d\n", err);
 }
 
+static void report_free_page_func(struct work_struct *work)
+{
+   struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
+report_free_page_work);
+
+   virtio_cread(vb->vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+
+   if (vb->cmd_id_received == VIRTIO_BALLOON_CMD_ID_DONE) {
+   /* Pass ULONG_MAX to give back all the free pages */
+   return_free_pages_to_mm(vb, ULONG_MAX);
+   } else if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
+  vb->cmd_id_received !=
+  virtio32_to_cpu(vb->vdev, vb->cmd_id_active)) {
+   virtio_balloon_report_free_page(vb);
+   }
+}
+
 #ifdef CONFIG_BALLOON_COMPACTION
 /*
  * virtballoon_migratepage - perform the balloon page migration on behalf of
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 0/2] virtio-balloon: tweak config_changed

2019-01-02 Thread Wei Wang
Since virtio-ccw doesn't work with accessing to the config registers
inside an interrupt context, this patch series avoids that issue by
moving the config register accesses to the related workqueue contexts.

Wei Wang (2):
  virtio-balloon: tweak config_changed implementation
  virtio-balloon: improve update_balloon_size_func

 drivers/virtio/virtio_balloon.c | 59 +
 1 file changed, 30 insertions(+), 29 deletions(-)

-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/4] barriers using data dependency

2019-01-02 Thread Michael S. Tsirkin
On Wed, Jan 02, 2019 at 04:36:40PM -0500, Alan Stern wrote:
> On Wed, 2 Jan 2019, Michael S. Tsirkin wrote:
> 
> > So as explained in Documentation/memory-barriers.txt e.g.
> > a load followed by a store require a full memory barrier,
> > to avoid store being ordered before the load.
> > Similarly load-load requires a read memory barrier.
> > 
> > Thinking about it, we can actually create a data dependency
> > by mixing the first loaded value into the pointer being
> > accessed.
> > 
> > This adds an API for this and uses it in virtio.
> > 
> > Written over the holiday and build tested only so far.
> 
> You are using the terminology from memory-barriers.txt, referring to
> the new dependency you create as a data dependency.  However,
> tools/memory-model/* uses a more precise name, calling it an address
> dependency.  Could you change the comments in the patches to use this
> name instead?

Sure, sounds good. While I'm at it, should memory-barriers.txt be
switched over too?

> > This patchset is also suboptimal on e.g. x86 where e.g. smp_rmb is a nop.
> 
> This should be easy to fix with an architecture-specific override.
> 
> Alan Stern

Absolutely. It does however mean that we'll need several
variants: mb/rmb, smp/dma/virt/mandatory.

I am still trying to decide whether it's good since it documents the
kind of barrier that we are trying to use - or bad since it's more
verbose and makes you choose one where they are all pretty cheap.

> > Sending out for early feedback/flames.
> > 
> > Michael S. Tsirkin (4):
> >   include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
> >   include/linux/compiler.h: allow memory operands
> >   barriers: convert a control to a data dependency
> >   virtio: use dependent_ptr_mb
> > 
> >  Documentation/memory-barriers.txt | 20 
> >  arch/alpha/include/asm/barrier.h  |  1 +
> >  drivers/virtio/virtio_ring.c  |  6 --
> >  include/asm-generic/barrier.h | 18 ++
> >  include/linux/compiler-clang.h|  5 ++---
> >  include/linux/compiler-gcc.h  |  4 
> >  include/linux/compiler-intel.h|  4 +---
> >  include/linux/compiler.h  |  8 +++-
> >  8 files changed, 53 insertions(+), 13 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-02 Thread Matthew Wilcox
On Wed, Jan 02, 2019 at 03:57:58PM -0500, Michael S. Tsirkin wrote:
> @@ -875,6 +893,8 @@ to the CPU containing it.  See the section on "Multicopy 
> atomicity"
>  for more information.
>  
>  
> +
> +
>  In summary:
>  
>(*) Control dependencies can order prior loads against later stores.

Was this hunk intentional?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-02 Thread Michael S. Tsirkin
On Wed, Jan 02, 2019 at 01:00:24PM -0800, Matthew Wilcox wrote:
> On Wed, Jan 02, 2019 at 03:57:58PM -0500, Michael S. Tsirkin wrote:
> > @@ -875,6 +893,8 @@ to the CPU containing it.  See the section on 
> > "Multicopy atomicity"
> >  for more information.
> >  
> >  
> > +
> > +
> >  In summary:
> >  
> >(*) Control dependencies can order prior loads against later stores.
> 
> Was this hunk intentional?

Nope, thanks for catching this.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PULL] virtio, vhost: features, fixes, cleanups

2019-01-02 Thread Michael S. Tsirkin
The following changes since commit 7566ec393f4161572ba6f11ad5171fd5d59b0fbd:

  Linux 4.20-rc7 (2018-12-16 15:46:55 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to a691ffb46edd7cb12a17ff0965ab59dbc95f48de:

  vhost: correct the related warning message (2018-12-19 18:23:50 -0500)


virtio, vhost: features, fixes, cleanups

discard in virtio blk
misc fixes and cleanups

Signed-off-by: Michael S. Tsirkin 


Changpeng Liu (1):
  virtio_blk: add discard and write zeroes support

Dongli Zhang (1):
  virtio: remove deprecated VIRTIO_PCI_CONFIG()

Michael S. Tsirkin (1):
  virtio: fix test build after uio.h change

Paolo Bonzini (1):
  vhost: split structs into a separate header file

Stefan Hajnoczi (1):
  vhost/vsock: switch to a mutex for vhost_vsock_hash

wangyan (1):
  vhost: correct the related warning message

 drivers/block/virtio_blk.c |  83 +++-
 drivers/vhost/scsi.c   |   4 +-
 drivers/vhost/vsock.c  |  16 ++---
 drivers/virtio/virtio_pci_legacy.c |   6 +-
 include/uapi/linux/vhost.h | 113 +---
 include/uapi/linux/vhost_types.h   | 128 +
 include/uapi/linux/virtio_blk.h|  54 
 tools/virtio/linux/kernel.h|   4 ++
 8 files changed, 283 insertions(+), 125 deletions(-)
 create mode 100644 include/uapi/linux/vhost_types.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 4/4] virtio: use dependent_ptr_mb

2019-01-02 Thread Michael S. Tsirkin
Use dependent_ptr_mb which is - on some architectures -
more light-weight than an rmb.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_ring.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 814b395007b2..2d320396eff8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -702,6 +702,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned 
int *len,
void *ret;
unsigned int i;
u16 last_used;
+   bool more;
 
START_USE(vq);
 
@@ -710,14 +711,15 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, 
unsigned int *len,
return NULL;
}
 
-   if (!more_used(vq)) {
+   more = more_used(vq);
+   if (!more) {
pr_debug("No more buffers in queue\n");
END_USE(vq);
return NULL;
}
 
/* Only get used array entries after they have been exposed by host. */
-   virtio_rmb(vq->weak_barriers);
+   vq = dependent_ptr_mb(vq, more);
 
last_used = (vq->last_used_idx & (vq->vring.num - 1));
i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 3/4] barriers: convert a control to a data dependency

2019-01-02 Thread Michael S. Tsirkin
It's not uncommon to have two access two unrelated memory locations in a
specific order.  At the moment one has to use a memory barrier for this.

However, if the first access was a read and the second used an address
depending on the first one we would have a data dependency and no
barrier would be necessary.

This adds a new interface: dependent_ptr_mb which does exactly this: it
returns a pointer with a data dependency on the supplied value.

Signed-off-by: Michael S. Tsirkin 
---
 Documentation/memory-barriers.txt | 20 
 arch/alpha/include/asm/barrier.h  |  1 +
 include/asm-generic/barrier.h | 18 ++
 include/linux/compiler.h  |  4 
 4 files changed, 43 insertions(+)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index c1d913944ad8..9dbaa2e1dbf6 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -691,6 +691,18 @@ case what's actually required is:
p = READ_ONCE(b);
}
 
+Alternatively, a control dependency can be converted to a data dependency,
+e.g.:
+
+   q = READ_ONCE(a);
+   if (q) {
+   b = dependent_ptr_mb(b, q);
+   p = READ_ONCE(b);
+   }
+
+Note how the result of dependent_ptr_mb must be used with the following
+accesses in order to have an effect.
+
 However, stores are not speculated.  This means that ordering -is- provided
 for load-store control dependencies, as in the following example:
 
@@ -836,6 +848,12 @@ out-guess your code.  More generally, although READ_ONCE() 
does force
 the compiler to actually emit code for a given load, it does not force
 the compiler to use the results.
 
+Converting to a data dependency helps with this too:
+
+   q = READ_ONCE(a);
+   b = dependent_ptr_mb(b, q);
+   WRITE_ONCE(b, 1);
+
 In addition, control dependencies apply only to the then-clause and
 else-clause of the if-statement in question.  In particular, it does
 not necessarily apply to code following the if-statement:
@@ -875,6 +893,8 @@ to the CPU containing it.  See the section on "Multicopy 
atomicity"
 for more information.
 
 
+
+
 In summary:
 
   (*) Control dependencies can order prior loads against later stores.
diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h
index 92ec486a4f9e..b4934e8c551b 100644
--- a/arch/alpha/include/asm/barrier.h
+++ b/arch/alpha/include/asm/barrier.h
@@ -59,6 +59,7 @@
  * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
  * in cases like this where there are no data dependencies.
  */
+#define ARCH_NEEDS_READ_BARRIER_DEPENDS 1
 #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
 
 #ifdef CONFIG_SMP
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 2cafdbb9ae4c..fa2e2ef72b68 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -70,6 +70,24 @@
 #define __smp_read_barrier_depends()   read_barrier_depends()
 #endif
 
+#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \
+   !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS)
+
+#define dependent_ptr_mb(ptr, val) ({  \
+   long dependent_ptr_mb_val = (long)(val);\
+   long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; \
+   \
+   BUILD_BUG_ON(sizeof(val) > sizeof(long));   \
+   OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val);   \
+   (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); \
+})
+
+#else
+
+#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); })
+
+#endif
+
 #ifdef CONFIG_SMP
 
 #ifndef smp_mb
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 6601d39e8c48..f599c30f1b28 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -152,9 +152,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, 
int val,
 #endif
 
 #ifndef OPTIMIZER_HIDE_VAR
+
 /* Make the optimizer believe the variable can be manipulated arbitrarily. */
 #define OPTIMIZER_HIDE_VAR(var)
\
__asm__ ("" : "=rm" (var) : "0" (var))
+
+#define COMPILER_HAS_OPTIMIZER_HIDE_VAR 1
+
 #endif
 
 /* Not-quite-unique ID. */
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/4] include/linux/compiler.h: allow memory operands

2019-01-02 Thread Michael S. Tsirkin
We don't really care whether the variable is in-register
or in-memory. Relax the constraint accordingly.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 1ad367b4cd8d..6601d39e8c48 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -154,7 +154,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int 
val,
 #ifndef OPTIMIZER_HIDE_VAR
 /* Make the optimizer believe the variable can be manipulated arbitrarily. */
 #define OPTIMIZER_HIDE_VAR(var)
\
-   __asm__ ("" : "=r" (var) : "0" (var))
+   __asm__ ("" : "=rm" (var) : "0" (var))
 #endif
 
 /* Not-quite-unique ID. */
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR

2019-01-02 Thread Michael S. Tsirkin
Since commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h
mutually exclusive") clang no longer reuses the OPTIMIZER_HIDE_VAR macro
from compiler-gcc - instead it gets the version in
include/linux/compiler.h.  Unfortunately that version doesn't actually
prevent compiler from optimizing out the variable.

Fix up by moving the macro out from compiler-gcc.h to compiler.h.
Compilers without incline asm support will keep working
since it's protected by an ifdef.

Also fix up comments to match reality since we are no longer overriding
any macros.

Build-tested with gcc and clang.

Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually 
exclusive")
Cc: Eli Friedman 
Cc: Joe Perches 
Cc: Nick Desaulniers 
Cc: Linus Torvalds 
Signed-off-by: Michael S. Tsirkin 
---
 include/linux/compiler-clang.h | 5 ++---
 include/linux/compiler-gcc.h   | 4 
 include/linux/compiler-intel.h | 4 +---
 include/linux/compiler.h   | 4 +++-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 3e7dafb3ea80..7ddaeb5182e3 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -3,9 +3,8 @@
 #error "Please don't include  directly, include 
 instead."
 #endif
 
-/* Some compiler specific definitions are overwritten here
- * for Clang compiler
- */
+/* Compiler specific definitions for Clang compiler */
+
 #define uninitialized_var(x) x = *(&(x))
 
 /* same as gcc, this was present in clang-2.6 so we can assume it works
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 2010493e1040..72054d9f0eaa 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -58,10 +58,6 @@
(typeof(ptr)) (__ptr + (off));  \
 })
 
-/* Make the optimizer believe the variable can be manipulated arbitrarily. */
-#define OPTIMIZER_HIDE_VAR(var)
\
-   __asm__ ("" : "=r" (var) : "0" (var))
-
 /*
  * A trick to suppress uninitialized variable warning without generating any
  * code
diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h
index 517bd14e1222..b17f3cd18334 100644
--- a/include/linux/compiler-intel.h
+++ b/include/linux/compiler-intel.h
@@ -5,9 +5,7 @@
 
 #ifdef __ECC
 
-/* Some compiler specific definitions are overwritten here
- * for Intel ECC compiler
- */
+/* Compiler specific definitions for Intel ECC compiler */
 
 #include 
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 06396c1cf127..1ad367b4cd8d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -152,7 +152,9 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int 
val,
 #endif
 
 #ifndef OPTIMIZER_HIDE_VAR
-#define OPTIMIZER_HIDE_VAR(var) barrier()
+/* Make the optimizer believe the variable can be manipulated arbitrarily. */
+#define OPTIMIZER_HIDE_VAR(var)
\
+   __asm__ ("" : "=r" (var) : "0" (var))
 #endif
 
 /* Not-quite-unique ID. */
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/4] barriers using data dependency

2019-01-02 Thread Michael S. Tsirkin
So as explained in Documentation/memory-barriers.txt e.g.
a load followed by a store require a full memory barrier,
to avoid store being ordered before the load.
Similarly load-load requires a read memory barrier.

Thinking about it, we can actually create a data dependency
by mixing the first loaded value into the pointer being
accessed.

This adds an API for this and uses it in virtio.

Written over the holiday and build tested only so far.

This patchset is also suboptimal on e.g. x86 where e.g. smp_rmb is a nop.

Sending out for early feedback/flames.

Michael S. Tsirkin (4):
  include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
  include/linux/compiler.h: allow memory operands
  barriers: convert a control to a data dependency
  virtio: use dependent_ptr_mb

 Documentation/memory-barriers.txt | 20 
 arch/alpha/include/asm/barrier.h  |  1 +
 drivers/virtio/virtio_ring.c  |  6 --
 include/asm-generic/barrier.h | 18 ++
 include/linux/compiler-clang.h|  5 ++---
 include/linux/compiler-gcc.h  |  4 
 include/linux/compiler-intel.h|  4 +---
 include/linux/compiler.h  |  8 +++-
 8 files changed, 53 insertions(+), 13 deletions(-)

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH V3 0/5] Hi:

2019-01-02 Thread Michael S. Tsirkin
On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling.

Will review, thanks!
One questions that comes to mind is whether it's all about bypassing
stac/clac.  Could you please include a performance comparison with
nosmap?

> 
> Test shows about 24% improvement on TX PPS. It should benefit other
> cases as well.
>
> Changes from V2:
> - fix buggy range overlapping check
> - tear down MMU notifier during vhost ioctl to make sure invalidation
>   request can read metadata userspace address and vq size without
>   holding vq mutex.
> Changes from V1:
> - instead of pinning pages, use MMU notifier to invalidate vmaps and
>   remap duing metadata prefetch
> - fix build warning on MIPS
> 
> Jason Wang (5):
>   vhost: generalize adding used elem
>   vhost: fine grain userspace memory accessors
>   vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
>   vhost: introduce helpers to get the size of metadata area
>   vhost: access vq metadata through kernel virtual address
> 
>  drivers/vhost/net.c   |   4 +-
>  drivers/vhost/vhost.c | 416 +-
>  drivers/vhost/vhost.h |  15 +-
>  3 files changed, 384 insertions(+), 51 deletions(-)
> 
> -- 
> 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 1/1] s390/virtio: handle find on invalid queue gracefully

2019-01-02 Thread Cornelia Huck
On Wed,  2 Jan 2019 18:50:20 +0100
Halil Pasic  wrote:

> A queue with a capacity of zero is clearly not a valid virtio queue.
> Some emulators report zero queue size if queried with an invalid queue
> index. Instead of crashing in this case let us just return -EINVAL. To
> make that work properly, let us fix the notifier cleanup logic as well.
> 
> Signed-off-by: Halil Pasic 
> ---
> 
> This patch is motivated by commit 86a5597 "virtio-balloon:
> VIRTIO_BALLOON_F_FREE_PAGE_HINT" (Wei Wang, 2018-08-27) which triggered
> the described scenario.  The emulator in question is the current QEMU.
> The problem we run into is the underflow in the following loop
> in  __vring_new_virtqueue():
> for (i = 0; i < vring.num-1; i++)
>   vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1)
> Namely vring.num is an unsigned int.
> 
> RFC because I'm not sure about -EINVAL being a good choice, and about
> us caring about what happens if a virtio driver misbehaves like described.

For virtio-pci, the spec says that a zero queue size means that the
queue is unavailable. I don't think we have specified that explicitly
for virtio-ccw, but it does make sense.

virtio-pci returns -ENOENT in that case, which might be a good choice
here as well.

> 
> ---
>  drivers/s390/virtio/virtio_ccw.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index fc9dbad476c0..147927ed4fca 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -272,6 +272,8 @@ static void virtio_ccw_drop_indicators(struct 
> virtio_ccw_device *vcdev)
>  {
>   struct virtio_ccw_vq_info *info;
>  
> + if (!vcdev->airq_info)
> + return;

Which case is this guarding against? names[i] was NULL for every index?

>   list_for_each_entry(info, >virtqueues, node)
>   drop_airq_indicator(info->vq, vcdev->airq_info);
>  }
> @@ -514,6 +516,10 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
> virtio_device *vdev,
>   err = info->num;
>   goto out_err;
>   }
> + if (info->num == 0) {
> + err = -EINVAL;
> + goto out_err;
> + }
>   size = PAGE_ALIGN(vring_size(info->num, KVM_VIRTIO_CCW_RING_ALIGN));
>   info->queue = alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO);
>   if (info->queue == NULL) {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues

2019-01-02 Thread Cornelia Huck
On Wed, 2 Jan 2019 16:59:19 +0100
Halil Pasic  wrote:

> On Wed, 2 Jan 2019 14:23:38 +0100
> Cornelia Huck  wrote:
> 
> > On Wed, 2 Jan 2019 10:53:14 +0100
> > Cornelia Huck  wrote:
> >   
> > > On Tue, 1 Jan 2019 00:40:19 +0100
> > > Halil Pasic  wrote:  

> > > > AFAICT tweaking the balloon code may be simpler than tweaking the
> > > > virtio-ccw (transport code). ccw_io_helper() relies on getting
> > > > an interrupt when the issued IO is done. If virtio-ccw is buggy, it
> > > > needs to be fixed, but I'm not sure it is.
> > > 
> > > I would not call virtio-ccw buggy, but it has some constraints that
> > > virtio-pci apparently doesn't have (and which did not show up so far;
> > > e.g. virtio-blk schedules a work item on config change, so there's no
> > > deadlock there.)
> > > 
> > > One way to get out of that constraint (don't interact with the config
> > > space directly in the config changed handler) would be to schedule a
> > > work item in virtio-ccw that calls virtio_config_changed() for the
> > > device. My understanding is that delaying the notification to a work
> > > queue would be fine.  
> > 
> > Unfortunately, calling virtio_config_changed() from a work item is not
> > enough: That function takes the config_lock, and the virtio-ccw code to
> > get the config both needs to allocate some memory and call schedule :/
> > 
> > The best option really seems to be
> > - have virtio-balloon move the handling of the config change onto a
> >   workqueue or something like that, and
> > - document that you cannot read/write the virtio config space from an
> >   atomic context
> > 
> > Unless someone has a better idea?
> >   
> 
> I wonder, would making config_lock a mutex suffice?

Unless I'm mistaken, you can't take a mutex in an interrupt path.

> I've already hinted that a virtio-balloon side fix is probably the
> simpler variant. 

Yes, I think so as well.

> I agree, let's fix the regression first, and think about wether to teach
> virtio-ccw to allow config manipulation from virtio_config_changed() or
> not later.

Or whether we can tweak the virtio code instead. But I agree, let's get
things working again first.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 1/2] virtio-net: bql support

2019-01-02 Thread Michael S. Tsirkin
On Wed, Jan 02, 2019 at 11:28:43AM +0800, Jason Wang wrote:
> 
> On 2018/12/31 上午2:45, Michael S. Tsirkin wrote:
> > On Thu, Dec 27, 2018 at 06:00:36PM +0800, Jason Wang wrote:
> > > On 2018/12/26 下午11:19, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues 
> > > > > > are
> > > > > > similar to wifi.  It's worth considering whether something similar 
> > > > > > to
> > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > benefitial.
> > > > > I've played a similar patch several days before. The tricky part is 
> > > > > the mode
> > > > > switching between napi and no napi. We should make sure when the 
> > > > > packet is
> > > > > sent and trakced by BQL,  it should be consumed by BQL as well. I did 
> > > > > it by
> > > > > tracking it through skb->cb.  And deal with the freeze by reset the 
> > > > > BQL
> > > > > status. Patch attached.
> > > > > 
> > > > > But when testing with vhost-net, I don't very a stable performance,
> > > > So how about increasing TSQ pacing shift then?
> > > 
> > > I can test this. But changing default TCP value is much more than a
> > > virtio-net specific thing.
> > Well same logic as wifi applies. Unpredictable latencies related
> > to radio in one case, to host scheduler in the other.
> > 
> > > > > it was
> > > > > probably because we batch the used ring updating so tx interrupt may 
> > > > > come
> > > > > randomly. We probably need to implement time bounded coalescing 
> > > > > mechanism
> > > > > which could be configured from userspace.
> > > > I don't think it's reasonable to expect userspace to be that smart ...
> > > > Why do we need time bounded? used ring is always updated when ring
> > > > becomes empty.
> > > 
> > > We don't add used when means BQL may not see the consumed packet in time.
> > > And the delay varies based on the workload since we count packets not 
> > > bytes
> > > or time before doing the batched updating.
> > > 
> > > Thanks
> > Sorry I still don't get it.
> > When nothing is outstanding then we do update the used.
> > So if BQL stops userspace from sending packets then
> > we get an interrupt and packets start flowing again.
> 
> 
> Yes, but how about the cases of multiple flows. That's where I see unstable
> results.
> 
> 
> > 
> > It might be suboptimal, we might need to tune it but I doubt running
> > timers is a solution, timer interrupts cause VM exits.
> 
> 
> Probably not a timer but a time counter (or event byte counter) in vhost to
> add used and signal guest if it exceeds a value instead of waiting the
> number of packets.
> 
> 
> Thanks

Well we already have VHOST_NET_WEIGHT - is it too big then?

And maybe we should expose the "MORE" flag in the descriptor -
do you think that will help?



> 
> > 
> > > > > Btw, maybe it's time just enable napi TX by default. I get ~10% TCP_RR
> > > > > regression on machine without APICv, (haven't found time to test APICv
> > > > > machine). But consider it was for correctness, I think it's 
> > > > > acceptable? Then
> > > > > we can do optimization on top?
> > > > > 
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > > ---
> > > > > > drivers/net/virtio_net.c | 27 +++
> > > > > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index cecfd77c9f3c..b657bde6b94b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -1325,7 +1325,8 @@ static int virtnet_receive(struct 
> > > > > > receive_queue *rq, int budget,
> > > > > > return stats.packets;
> > > > > > }
> > > > > > -static void free_old_xmit_skbs(struct send_queue *sq)
> > > > > > +static void free_old_xmit_skbs(struct send_queue *sq, struct 
> > > > > > netdev_queue *txq,
> > > > > > +  bool use_napi)
> > > > > > {
> > > > > > struct sk_buff *skb;
> > > > > > unsigned int len;
> > > > > > @@ -1347,6 +1348,9 @@ static void free_old_xmit_skbs(struct 
> > > > > > send_queue *sq)
> > > > > > if (!packets)
> > > > > > return;
> > > > > > +   if (use_napi)
> > > > > > +   netdev_tx_completed_queue(txq, packets, bytes);
> > > > > > +
> > > > > > u64_stats_update_begin(>stats.syncp);
> > > > > > sq->stats.bytes += bytes;
> > > > > > sq->stats.packets += packets;
> > > > > > @@ -1364,7 +1368,7 @@ static void virtnet_poll_cleantx(struct 
> > > > > > receive_queue *rq)
> > > > > > return;
> > > > > > if (__netif_tx_trylock(txq)) {
> > > > > > -   free_old_xmit_skbs(sq);
> > > > > > +   free_old_xmit_skbs(sq, 

Re: [PATCH RFC 1/2] virtio-net: bql support

2019-01-02 Thread Michael S. Tsirkin
On Wed, Jan 02, 2019 at 11:30:11AM +0800, Jason Wang wrote:
> 
> On 2018/12/31 上午2:48, Michael S. Tsirkin wrote:
> > On Thu, Dec 27, 2018 at 06:04:53PM +0800, Jason Wang wrote:
> > > On 2018/12/26 下午11:22, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2018 at 04:17:36PM +0800, Jason Wang wrote:
> > > > > On 2018/12/6 上午6:54, Michael S. Tsirkin wrote:
> > > > > > When use_napi is set, let's enable BQLs.  Note: some of the issues 
> > > > > > are
> > > > > > similar to wifi.  It's worth considering whether something similar 
> > > > > > to
> > > > > > commit 36148c2bbfbe ("mac80211: Adjust TSQ pacing shift") might be
> > > > > > benefitial.
> > > > > I've played a similar patch several days before. The tricky part is 
> > > > > the mode
> > > > > switching between napi and no napi. We should make sure when the 
> > > > > packet is
> > > > > sent and trakced by BQL,  it should be consumed by BQL as well.
> > > > I just went over the patch again and I don't understand this comment.
> > > > This patch only enabled BQL with tx napi.
> > > > 
> > > > Thus there's no mode switching.
> > > > 
> > > > What did I miss?
> > > Consider the case:
> > > 
> > > 
> > > TX NAPI is disabled:
> > > 
> > > send N packets
> > > 
> > > turn TX NAPI on:
> > > 
> > > get tx interrupt
> > > 
> > > BQL try to consume those packets when lead WARN for dql.
> > > 
> > > 
> > > Thanks
> > Can one really switch tx napi on and off? How?
> > While root can change the napi_tx module parameter, I don't think
> > that has any effect outside device probe time. What did I miss?
> > 
> > 
> > 
> 
> We support switch the mode through ethtool recently. See
> 
> commit 0c465be183c7c57a26446df6ea96d8676b865f92
> Author: Jason Wang 
> Date:   Tue Oct 9 10:06:26 2018 +0800
> 
>     virtio_net: ethtool tx napi configuration
> 
>     Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
>     Interrupt moderation is currently not supported, so these accept and
>     display the default settings of 0 usec and 1 frame.
> 
>     Toggle tx napi through setting tx-frames. So as to not interfere
>     with possible future interrupt moderation, value 1 means tx napi while
>     value 0 means not.
> 
>     Only allow the switching when device is down for simplicity.
> 
>     Link: https://patchwork.ozlabs.org/patch/948149/
>     Suggested-by: Jason Wang 
>     Signed-off-by: Willem de Bruijn 
>     Signed-off-by: Jason Wang 
>     Signed-off-by: David S. Miller 
> 
> Thanks


It's disabled when device is up - isn't that enough?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues

2019-01-02 Thread Cornelia Huck
On Wed, 2 Jan 2019 10:53:14 +0100
Cornelia Huck  wrote:

> On Tue, 1 Jan 2019 00:40:19 +0100
> Halil Pasic  wrote:

> > As I said, at the moment I don't have a preference regarding the fix,
> > partly because I'm not sure if "reading config inside the handler" is OK
> > or not. Maybe Connie or Michael can help us here. I'm however sure that
> > commit 86a5597 "virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT"
> > breaks virtio-balloon with the ccw transport (i.e. effectively breaks 
> > virtio-balloon on s390): it used to work before and does not work
> > after.  
> 
> Yes, that's unfortunate.
> 
> > 
> > AFAICT tweaking the balloon code may be simpler than tweaking the
> > virtio-ccw (transport code). ccw_io_helper() relies on getting
> > an interrupt when the issued IO is done. If virtio-ccw is buggy, it
> > needs to be fixed, but I'm not sure it is.  
> 
> I would not call virtio-ccw buggy, but it has some constraints that
> virtio-pci apparently doesn't have (and which did not show up so far;
> e.g. virtio-blk schedules a work item on config change, so there's no
> deadlock there.)
> 
> One way to get out of that constraint (don't interact with the config
> space directly in the config changed handler) would be to schedule a
> work item in virtio-ccw that calls virtio_config_changed() for the
> device. My understanding is that delaying the notification to a work
> queue would be fine.

Unfortunately, calling virtio_config_changed() from a work item is not
enough: That function takes the config_lock, and the virtio-ccw code to
get the config both needs to allocate some memory and call schedule :/

The best option really seems to be
- have virtio-balloon move the handling of the config change onto a
  workqueue or something like that, and
- document that you cannot read/write the virtio config space from an
  atomic context

Unless someone has a better idea?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address

2019-01-02 Thread Jason Wang


On 2018/12/31 上午2:30, Michael S. Tsirkin wrote:

On Thu, Dec 27, 2018 at 05:39:21PM +0800, Jason Wang wrote:

On 2018/12/26 下午11:02, Michael S. Tsirkin wrote:

On Wed, Dec 26, 2018 at 11:57:32AM +0800, Jason Wang wrote:

On 2018/12/25 下午8:50, Michael S. Tsirkin wrote:

On Tue, Dec 25, 2018 at 06:05:25PM +0800, Jason Wang wrote:

On 2018/12/25 上午2:10, Michael S. Tsirkin wrote:

On Mon, Dec 24, 2018 at 03:53:16PM +0800, Jason Wang wrote:

On 2018/12/14 下午8:36, Michael S. Tsirkin wrote:

On Fri, Dec 14, 2018 at 11:57:35AM +0800, Jason Wang wrote:

On 2018/12/13 下午11:44, Michael S. Tsirkin wrote:

On Thu, Dec 13, 2018 at 06:10:22PM +0800, Jason Wang wrote:

It was noticed that the copy_user() friends that was used to access
virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software check,
speculation barrier, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets.

This patch tries to eliminate those overhead by pin vq metadata pages
and access them through vmap(). During SET_VRING_ADDR, we will setup
those mappings and memory accessors are modified to use pointers to
access the metadata directly.

Note, this was only done when device IOTLB is not enabled. We could
use similar method to optimize it in the future.

Tests shows about ~24% improvement on TX PPS when using virtio-user +
vhost_net + xdp1 on TAP (CONFIG_HARDENED_USERCOPY is not enabled):

Before: ~5.0Mpps
After:  ~6.1Mpps

Signed-off-by: Jason Wang
---
   drivers/vhost/vhost.c | 178 ++
   drivers/vhost/vhost.h |  11 +++
   2 files changed, 189 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index bafe39d2e637..1bd24203afb6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -443,6 +443,9 @@ void vhost_dev_init(struct vhost_dev *dev,
vq->indirect = NULL;
vq->heads = NULL;
vq->dev = dev;
+   memset(>avail_ring, 0, sizeof(vq->avail_ring));
+   memset(>used_ring, 0, sizeof(vq->used_ring));
+   memset(>desc_ring, 0, sizeof(vq->desc_ring));
mutex_init(>mutex);
vhost_vq_reset(dev, vq);
if (vq->handle_kick)
@@ -614,6 +617,102 @@ static void vhost_clear_msg(struct vhost_dev *dev)
spin_unlock(>iotlb_lock);
   }
+static int vhost_init_vmap(struct vhost_vmap *map, unsigned long uaddr,
+  size_t size, int write)
+{
+   struct page **pages;
+   int npages = DIV_ROUND_UP(size, PAGE_SIZE);
+   int npinned;
+   void *vaddr;
+
+   pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+
+   npinned = get_user_pages_fast(uaddr, npages, write, pages);
+   if (npinned != npages)
+   goto err;
+

As I said I have doubts about the whole approach, but this
implementation in particular isn't a good idea
as it keeps the page around forever.

The pages wil be released during set features.



So no THP, no NUMA rebalancing,

For THP, we will probably miss 2 or 4 pages, but does this really matter
consider the gain we have?

We as in vhost? networking isn't the only thing guest does.
We don't even know if this guest does a lot of networking.
You don't
know what else is in this huge page. Can be something very important
that guest touches all the time.

Well, the probability should be very small consider we usually give several
gigabytes to guest. The rest of the pages that doesn't sit in the same
hugepage with metadata can still be merged by THP.  Anyway, I can test the
differences.

Thanks!


For NUMA rebalancing, I'm even not quite sure if
it can helps for the case of IPC (vhost). It looks to me the worst case it
may cause page to be thrash between nodes if vhost and userspace are running
in two nodes.

So again it's a gain for vhost but has a completely unpredictable effect on
other functionality of the guest.

That's what bothers me with this approach.

So:

- The rest of the pages could still be balanced to other nodes, no?

- try to balance metadata pages (belongs to co-operate processes) itself is
still questionable

I am not sure why. It should be easy enough to force the VCPU and vhost
to move (e.g. start them pinned to 1 cpu, then pin them to another one).
Clearly sometimes this would be necessary for load balancing reasons.

Yes, but it looks to me the part of motivation of auto NUMA is to avoid
manual pinning.

... of memory. Yes.



With autonuma after a while (could take seconds but it will happen) the
memory will migrate.


Yes. As you mentioned during the discuss, I wonder we could do it similarly
through mmu notifier like APIC access page in commit c24ae0dcd3e ("kvm: x86:
Unpin and remove kvm_arch->apic_access_page")

That would be a possible approach.


Yes, this looks possible, and the 

Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues

2019-01-02 Thread Cornelia Huck
On Tue, 1 Jan 2019 00:40:19 +0100
Halil Pasic  wrote:

> On Mon, 31 Dec 2018 06:03:51 +
> "Wang, Wei W"  wrote:
> 
> > On Sunday, December 30, 2018 2:06 PM, Halil Pasic wrote:  
> > > 
> > > I guess you are the first one trying to read virtio config from within 
> > > interrupt
> > > context. AFAICT this never worked.  
> > 
> > I'm not sure about "never worked". It seems to work well with virtio-pci.
> > But looking forward to hearing a solid reason why reading config inside
> > the handler is forbidden (if that's true).  
> 
> By "never worked" I meant "never worked with virtio-ccw". Sorry
> about the misunderstanding. Seems I've also failed to convey that I don't
> know if reading config inside the handler is forbidden or not. So please
> don't expect me providing the solid reasons you are looking forward to.

It won't work with the current code, and this is all a bit ugly :( More
verbose explanation below.

> 
> >   
> > > About what happens. The apidoc of ccw_device_start() says it needs to be
> > > called with the ccw device lock held, so ccw_io_helper() tries to take it 
> > > (since
> > > forever I guess). OTOH do_cio_interrupt() takes the subchannel lock and
> > > io_subchannel_initialize_dev() makes the ccw device lock be the subchannel
> > > lock. That means when one tries to get virtio config form within a cio
> > > interrupt context we deadlock, because we try to take a lock we already 
> > > have.
> > > 
> > > That said, I don't think this limitation is by design (i.e. intended).
> > > Maybe Connie can help us with that question. AFAIK we have nothing
> > > documented regarding this (neither that can nor can't).

The main problem is that channel I/O is a fundamentally asynchronous
mechanism. As channel devices don't have the concept of config spaces
(or some other things that virtio needs), I decided to map
reading/writing the config space to channel commands. Starting I/O on a
subchannel always needs the lock (to avoid races on the subchannel),
and the asynchronous interrupt for that I/O needs the lock as well (for
the same reason; things like the scsw contain state that you want to
access without races). A config change also means that the subchannel
becomes state pending (and an interrupt is made pending), so the
subchannel lock is taken for that path as well. (Virtqueue
notifications are handled differently on modern QEMU, but that does not
come into play here.)

> > > 
> > > Obviously, there are multiple ways around this problem, and at the moment
> > > I can't tell which would be my preferred one.  
> > 
> > Yes, it's also not difficult to tweak the virtio-balloon code to avoid that 
> > issue.
> > But if that's just an issue with ccw itself, I think it's better to tweak 
> > ccw and
> > remain virtio-balloon unchanged.
> >   
> 
> As I said, at the moment I don't have a preference regarding the fix,
> partly because I'm not sure if "reading config inside the handler" is OK
> or not. Maybe Connie or Michael can help us here. I'm however sure that
> commit 86a5597 "virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT"
> breaks virtio-balloon with the ccw transport (i.e. effectively breaks 
> virtio-balloon on s390): it used to work before and does not work
> after.

Yes, that's unfortunate.

> 
> AFAICT tweaking the balloon code may be simpler than tweaking the
> virtio-ccw (transport code). ccw_io_helper() relies on getting
> an interrupt when the issued IO is done. If virtio-ccw is buggy, it
> needs to be fixed, but I'm not sure it is.

I would not call virtio-ccw buggy, but it has some constraints that
virtio-pci apparently doesn't have (and which did not show up so far;
e.g. virtio-blk schedules a work item on config change, so there's no
deadlock there.)

One way to get out of that constraint (don't interact with the config
space directly in the config changed handler) would be to schedule a
work item in virtio-ccw that calls virtio_config_changed() for the
device. My understanding is that delaying the notification to a work
queue would be fine.

>From what I can see, that should make us safe on any modern QEMU (that
uses adapter interrupts). There still might be problems if we are using
classic subchannel interrupts for virtqueue notifications and the
handler interacts with the config space, but I'm not sure whether it is
worth spending time on that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization