Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-08 Thread Cornelia Huck
On Tue, 07 Oct 2014 18:24:22 -0700
Andy Lutomirski l...@amacapital.net wrote:

 On 10/07/2014 07:39 AM, Cornelia Huck wrote:
  This patchset aims to get us some way to implement virtio-1 compliant
  and transitional devices in qemu. Branch available at
  
  git://github.com/cohuck/qemu virtio-1
  
  I've mainly focused on:
  - endianness handling
  - extended feature bits
  - virtio-ccw new/changed commands
 
 At the risk of some distraction, would it be worth thinking about a
 solution to the IOMMU bypassing mess as part of this?

I think that is a whole different issue. virtio-1 is basically done - we
just need to implement it - while the IOMMU/DMA stuff certainly needs
more discussion. Therefore, I'd like to defer to the other discussion
thread here.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: BUG_ON in virtio-ring.c

2014-10-08 Thread Alexey Lapitsky
Hi,

I'm hitting this bug with both ext4 and btrfs.

Here's an example of the backtrace:
https://gist.github.com/vzctl/e888a821333979120932

I tried raising this BUG only for direct ring and it solved the problem:

 -   BUG_ON(total_sg  vq-vring.num);
+   BUG_ON(total_sg  vq-vring.num  !vq-indirect);

Shall I submit the patch or is a more elaborate fix required?

--

Alexey



On Mon, May 27, 2013 at 7:38 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Dave Airlie airl...@gmail.com writes:
 correct.

 If I have an indirect ring and I'm adding sgs to it and the host is
 delayed (say I've got a thread consuming things from the vring and its
 off doing something interesting),
 I'd really like to get ENOSPC back from virtqueue_add. However if the
 indirect addition fails due to free_sg being 0, we hit the BUG_ON
 before we ever get to the ENOSPC check.

 It is correct for the moment: drivers can't assume indirect buffer
 support in the transport.

 BUT for a new device, we could say this depends on indirect descriptor
 support, put the appropriate check in the device init, and then remove
 the BUG_ON().

 But if the transport has indirect buffer support, can it change its
 mind at runtime?

 It's a layering violation.  The current rule is simple:

 A driver should never submit a buffer which can't fit in the
 ring.

 This has the nice property that OOM (ie. indirect buffer alloction
 fail) just slows things down, doesn't break things.

 In this case we have vq-indirect set, but the device has run out of
 free buffers,
 but it isn't a case that in+out would overflow it if it had free
 buffers since it would use
 an indirect and succeed.

 OK, but when do you re-xmit?  What if the ring is empty, and you
 submitted a buffer that needs indirect?  You won't get interrupted
 again, because the device has consumed all the buffers.  You need to
 have your own timer or something equally hackish.

 Getting -ENOSPC is definitely what should happen from what I can see,
 not a BUG_ON,
 I should get a BUG_ON only if the device reports no indirect support.

 I think we should return -ENOMEM if we can't indirect because of failed
 allocation and it doesn't fit in the ring, ie:

 This:
 BUG_ON(total_sg  vq-vring.num);
 BUG_ON(total_sg == 0);

 if (vq-vq.num_free  total_sg) {
 pr_debug(Can't add buf len %i - avail = %i\n,
  total_sg, vq-vq.num_free);
 /* FIXME: for historical reasons, we force a notify here if
  * there are outgoing parts to the buffer.  Presumably the
  * host should service the ring ASAP. */
 if (out_sgs)
 vq-notify(vq-vq);
 END_USE(vq);
 return -ENOSPC;
 }

 Becomes (untested):

 BUG_ON(total_sg == 0);

 if (vq-vq.num_free  total_sg) {
 if (total_sg  vq-vring.num) {
 BUG_ON(!vq-indirect);
/* Return -ENOMEM only if we have nothing else to do */
if (vq-vq.num_free == vq-vring.num)
return -ENOMEM;
 }
 pr_debug(Can't add buf len %i - avail = %i\n,
  total_sg, vq-vq.num_free);
 /* FIXME: for historical reasons, we force a notify here if
  * there are outgoing parts to the buffer.  Presumably the
  * host should service the ring ASAP. */
 if (out_sgs)
 vq-notify(vq-vq);
 END_USE(vq);
 return -ENOSPC;
 }

 Cheers,
 Rusty.


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-08 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org
---
 drivers/virtio/virtio_balloon.c | 46 +
 1 file changed, 46 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 213da41..ca77831 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_vballoon_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_vballoon_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_vballoon_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,33 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ *  memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the system
+ * return and retry the allocation that forced the OOM killer to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   unsigned int num_freed_pages;
+   unsigned long *freed = (unsigned long *)parm;
+   struct virtio_balloon *vb = container_of((struct notifier_block *)self,
+struct virtio_balloon, nb);
+
+   num_freed_pages = leak_balloon(vb, oom_vballoon_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -474,6 +511,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb_mapping;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -483,6 +526,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb_mapping:
balloon_mapping_free(vb_mapping);
@@ -511,6 +556,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev-priv;
 
+   unregister_oom_notifier(vb-nb);
kthread_stop(vb-thread);

[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-08 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Accessing to vb-num_pfns outside of vb-balloon_lock is wrong and unsafe.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 25ebe8e..213da41 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] shrink virtio baloon on OOM in guest

2014-10-08 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when
Linux is under severe memory pressure. Various mechanisms are responsible for
correct virtio_balloon memory management. Nevertheless it is often the case
that these control tools does not have enough time to react on fast changing
memory load. As a result OS runs out of memory and invokes OOM-killer.
The balancing of memory by use of the virtio balloon should not cause the
termination of processes while there are pages in the balloon. Now there is
no way for virtio balloon driver to free memory at the last moment before
some process get killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is expected
to be called from the oom notifier call chain in out_of_memory() function.
If virtio balloon could release some memory, it will make the system to
return and retry the allocation that forced the out of memory killer to run.

Patch 1 of this series adds support for implementation of virtio_balloon
callback, so now leak_balloon() function returns number of freed pages.
Patch 2 implements virtio_balloon callback itself.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization