Re: [RFC PATCH v5 0/8] Make balloon drivers' memory changes known to the rest of the kernel

2022-10-19 Thread Denis V. Lunev via Virtualization

On 10/19/22 12:53, Konstantin Khlebnikov wrote:
On Wed, 19 Oct 2022 at 12:57, Alexander Atanasov 
 wrote:


Currently balloon drivers (Virtio,XEN, HyperV, VMWare, ...)
inflate and deflate the guest memory size but there is no
way to know how much the memory size is changed by them.

Make it possible for the drivers to report the values to mm core.

Display reported InflatedTotal and InflatedFree in /proc/meminfo
and print these values on OOM and sysrq from show_mem().

The two values are the result of the two modes the drivers work
with using adjust_managed_page_count or without.

In earlier versions, there was a notifier for these changes
but after discussion - it is better to implement it in separate
patch series. Since it came out as larger work than initially
expected.

Amount of inflated memory can be used:
 - totalram_pages() users working with drivers not using
    adjust_managed_page_count
 - si_meminfo(..) users can improve calculations
 - by userspace software that monitors memory pressure


Sorry, I see no reason for that series.
Balloon inflation adjusts totalram_pages. That's enough.


no, they are not at least under some circumstances, f.e.
virtio balloon does not do that with VIRTIO_BALLOON_F_DEFLATE_ON_OOM
set

There is no reason to know the amount of non-existent ballooned memory 
inside.

Management software which works outside should care about that.


The problem comes at the moment when we are running
our Linux server inside virtual machine and the customer
comes with crazy questions "where our memory?".

For debugging you could get current balloon size from /proc/vmstat 
(balloon_inflate - balloon_deflate).

Also (I guess) /proc/kpageflags has a bit for that.

Anyway it's easy to monitor balloon inflation by seeing changes of 
total memory size.

for monitoring - may be. But in order to report total amount
there is no interface so far.



Alexander Atanasov (8):
  mm: Make a place for a common balloon code
  mm: Enable balloon drivers to report inflated memory
  mm: Display inflated memory to users
  mm: Display inflated memory in logs
  drivers: virtio: balloon - report inflated memory
  drivers: vmware: balloon - report inflated memory
  drivers: hyperv: balloon - report inflated memory
  documentation: create a document about how balloon drivers operate

 Documentation/filesystems/proc.rst            |   6 +
 Documentation/mm/balloon.rst                  | 138
++
 MAINTAINERS                                   |   4 +-
 arch/powerpc/platforms/pseries/cmm.c          |   2 +-
 drivers/hv/hv_balloon.c                       |  12 ++
 drivers/misc/vmw_balloon.c                    |   3 +-
 drivers/virtio/virtio_balloon.c               |   7 +-
 fs/proc/meminfo.c                             |  10 ++
 .../linux/{balloon_compaction.h => balloon.h} |  18 ++-
 lib/show_mem.c                                |   8 +
 mm/Makefile                                   |   2 +-
 mm/{balloon_compaction.c => balloon.c}        |  19 ++-
 mm/migrate.c                                  |   1 -
 mm/vmscan.c                                   |   1 -
 14 files changed, 213 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/mm/balloon.rst
 rename include/linux/{balloon_compaction.h => balloon.h} (91%)
 rename mm/{balloon_compaction.c => balloon.c} (94%)

v4->v5:
 - removed notifier
 - added documentation
 - vmware update after op is done , outside of the mutex
v3->v4:
 - add support in hyperV and vmware balloon drivers
 - display balloon memory in show_mem so it is logged on OOM and
on sysrq
v2->v3:
 - added missed EXPORT_SYMBOLS
Reported-by: kernel test robot 
 - instead of balloon_common.h just use balloon.h (yes, naming is
hard)
 - cleaned up balloon.h - remove from files that do not use it and
   remove externs from function declarations
v1->v2:
 - reworked from simple /proc/meminfo addition

Cc: Michael S. Tsirkin 
Cc: David Hildenbrand 
Cc: Wei Liu 
Cc: Nadav Amit 
Cc: pv-driv...@vmware.com
Cc: Jason Wang 
Cc: virtualization@lists.linux-foundation.org
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Dexuan Cui 
Cc: linux-hyp...@vger.kernel.org
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Oleksandr Tyshchenko 
Cc: xen-de...@lists.xenproject.org

base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
-- 
2.31.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 1/1] drivers/vhost: vhost-blk accelerator for virtio-blk guests

2022-07-26 Thread Denis V. Lunev via Virtualization

On 26.07.2022 16:45, Michael S. Tsirkin wrote:

On Mon, Jul 25, 2022 at 11:27:53PM +0300, Andrey Zhadchenko wrote:

Although QEMU virtio is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel istead of passing them to QEMU. The patch adds vhost-blk
kernel module to do so.

Some test setups:
fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs

SSD:
| randread, IOPS  | randwrite, IOPS |
Host   |  95.8k  |  85.3k  |
QEMU virtio|  57.5k  |  79.4k  |
QEMU vhost-blk |  95.6k  |  84.3k  |

RAMDISK (vq == vcpu):
  | randread, IOPS | randwrite, IOPS |
virtio, 1vcpu|  123k  |  129k   |
virtio, 2vcpu|  253k (??) |  250k (??)  |
virtio, 4vcpu|  158k  |  154k   |
vhost-blk, 1vcpu |  110k  |  113k   |
vhost-blk, 2vcpu |  247k  |  252k   |
vhost-blk, 4vcpu |  576k  |  567k   |

Signed-off-by: Andrey Zhadchenko 


Sounds good to me. What this depends on is whether some userspace
will actually use it. In the past QEMU rejected support for vhost-blk,
if this time it fares better then I won't have a problem merging
the kernel bits.


In general, we are going to use this in production for our
next generation product and thus we would be very glad
to see both parts, i.e. in-kernel and in-QEMU to be merged
to reduce our support costs.

I think that numbers are talking on themselves and this
would be quite beneficial for all parties especially keeping
in mind that QCOW2 is also now supported for this kind
of IO engine.

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_balloon: prevent uninitialized variable use

2017-03-23 Thread Denis V. Lunev
On 03/23/2017 06:17 PM, Arnd Bergmann wrote:
> The latest gcc-7.0.1 snapshot reports a new warning:
>
> virtio/virtio_balloon.c: In function 'update_balloon_stats':
> virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in 
> this function [-Werror=uninitialized]
> virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in 
> this function [-Werror=uninitialized]
> virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in 
> this function [-Werror=uninitialized]
> virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in 
> this function [-Werror=uninitialized]
>
> This seems absolutely right, so we should add an extra check to
> prevent copying uninitialized stack data into the statistics.
> From all I can tell, this has been broken since the statistics code
> was originally added in 2.6.34.
>
> Fixes: 9564e138b1f6 ("virtio: Add memory statistics reporting to the balloon 
> driver (V4)")
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
Reviewed-by: Denis V. Lunev <d...@openvz.org>



> ---
>  drivers/virtio/virtio_balloon.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 4e1191508228..cd5c54e2003d 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -254,12 +254,14 @@ static void update_balloon_stats(struct virtio_balloon 
> *vb)
>  
>   available = si_mem_available();
>  
> +#ifdef CONFIG_VM_EVENT_COUNTERS
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_IN,
>   pages_to_bytes(events[PSWPIN]));
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_OUT,
>   pages_to_bytes(events[PSWPOUT]));
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
> +#endif
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
>   pages_to_bytes(i.freeram));
>   update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMTOT,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/1] virtio: update balloon size in balloon "probe"

2016-09-29 Thread Denis V. Lunev
From: Konstantin Neumoin <kneum...@virtuozzo.com>

The following commit 'fad7b7b27b6a (virtio_balloon: Use a workqueue
instead of "vballoon" kthread)' has added a regression. Original code with
kthread starts the thread inside probe and checks the necessity to update
balloon inside the thread immediately.

Nowadays the code behaves differently. Work is queued only on the first
command from the host after the negotiation. Thus there is a window
especially at the guest startup or the module reloading when the balloon
size is not updated until the notification from the host.

This patch adds balloon size check at the end of the probe to match
original behaviour.

Signed-off-by: Konstantin Neumoin <kneum...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 2 ++
 1 file changed, 2 insertions(+)

Changes from v1:
- fixed description
- removed update_balloon_size() call

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4e7003d..181793f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -577,6 +577,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
+   if (towards_target(vb))
+   virtballoon_changed(vdev);
return 0;
 
 out_del_vqs:
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] update balloon size in balloon "probe"

2016-09-23 Thread Denis V. Lunev
From: Konstantin Neumoin <kneum...@virtuozzo.com>

Patch
Commit 3d2a3774c1b046f548ebea0391a602fd5685a307
Author: Michael S. Tsirkin <m...@redhat.com>
Date:   Tue Mar 10 11:55:08 2015 +1030
virtio-balloon: do not call blocking ops when !TASK_RUNNING
has added a regression. Original code with wait_event_interruptible
checked the condition before start waiting and started balloon operations
if necessary.

Right now balloon is not inflated if ballon target is set before the
driver is loaded.

Signed-off-by: Konstantin Neumoin <kneum...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: "Michael S. Tsirkin" <m...@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4e7003d..0a6c10f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -577,6 +577,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
+   if (towards_target(vb))
+   virtballoon_changed(vdev);
+   update_balloon_size(vb);
+
return 0;
 
 out_del_vqs:
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] balloon: stop inflate balloon after oom notification

2016-09-09 Thread Denis V. Lunev
From: Konstantin Neumoin <kneum...@virtuozzo.com>

At this moment oom notification in balloon does not work as expected.
After virtballoon_oom_notify there is an infinitive loop:
 - virtballoon_oom_notify was called and balloon was deflated
 - balloon get notification that config was changed, compare target and
   actual and try to reach target again.

This patch adds global variable fail_counter which indicates that oom has
been happened. We check that fail_counter was changed between calls
update_balloon_size_func. In this case we should not try to inflate balloon
even if actual != target.

Signed-off-by: Konstantin Neumoin <kneum...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4e7003d..253bf05 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -50,6 +50,8 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+static unsigned long fail_count;
+
 struct virtio_balloon {
struct virtio_device *vdev;
struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -361,6 +363,8 @@ static int virtballoon_oom_notify(struct notifier_block 
*self,
unsigned long *freed;
unsigned num_freed_pages;
 
+   fail_count++;
+
vb = container_of(self, struct virtio_balloon, nb);
if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
return NOTIFY_OK;
@@ -386,11 +390,22 @@ static void update_balloon_size_func(struct work_struct 
*work)
 {
struct virtio_balloon *vb;
s64 diff;
+   static unsigned long fc;
+
+   if (fc == 0)
+   fc = fail_count;
 
vb = container_of(work, struct virtio_balloon,
  update_balloon_size_work);
diff = towards_target(vb);
 
+   if (fc != fail_count) {
+   fc = fail_count;
+   /* Don't inflate balloon after oom notification */
+   if (diff > 0)
+   return;
+   }
+
if (diff > 0)
diff -= fill_balloon(vb, diff);
else if (diff < 0)
-- 
2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] balloon: check the number of available pages in leak balloon

2016-07-11 Thread Denis V. Lunev
From: Konstantin Neumoin <kneum...@virtuozzo.com>

The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.

So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.

Signed-off-by: Konstantin Neumoin <kneum...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..f6ea8f4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -202,6 +202,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
num = min(num, ARRAY_SIZE(vb->pfns));
 
mutex_lock(>balloon_lock);
+   /* We can't release more pages than taken */
+   num = min(num, (size_t)vb->num_pages);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
page = balloon_page_dequeue(vb_dev_info);
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: export 'available' memory to balloon statistics

2016-02-23 Thread Denis V. Lunev

On 02/23/2016 06:53 PM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2016 at 06:26:47PM +0300, Denis V. Lunev wrote:

On 02/23/2016 06:10 PM, Michael S. Tsirkin wrote:

On Tue, Feb 16, 2016 at 06:50:52PM +0300, Denis V. Lunev wrote:

From: Igor Redko <red...@virtuozzo.com>

Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap.

Signed-off-by: Igor Redko <red...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Andrew Morton <a...@linux-foundation.org>

Oops - I missed the fact that this affects host/guest ABI.

Can you please submit ABI update proposal to virtio tc?
Spec patch would be even better.

This is important to ensure there are no conflicts
with other features being developed in parallel.

hmmm

 From my point of view ABI remains untouched.

Anything exposed by guest to host is ABI.
Once we add stuff there, we never can remove it
as some host might rely on it.


The guest can send any amount of ;
pairs and unknown tags are properly ignored
by the host.

That is why I think that this change is safe.

What happens if someone uses the tag you
used for VIRTIO_BALLOON_S_AVAIL, for some
other purpose?
Any tools using VIRTIO_BALLOON_S_AVAIL will be confused.

actually this constant resides in QEMU only,
values are reported above using JSON and
string tags.


Really, it's not hard to get a tag number from virtio TC,
so please just do this.


ok. So do you propose to negotiate maximum allowed
tag to send at the driver start time?

we will have to guard this exchange with proper flag
in feature space then. This could be done but from my
point of view this looks like serious over-complication.
Do we have somebody who can judge?

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: export 'available' memory to balloon statistics

2016-02-23 Thread Denis V. Lunev

On 02/23/2016 06:10 PM, Michael S. Tsirkin wrote:

On Tue, Feb 16, 2016 at 06:50:52PM +0300, Denis V. Lunev wrote:

From: Igor Redko <red...@virtuozzo.com>

Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap.

Signed-off-by: Igor Redko <red...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Andrew Morton <a...@linux-foundation.org>

Oops - I missed the fact that this affects host/guest ABI.

Can you please submit ABI update proposal to virtio tc?
Spec patch would be even better.

This is important to ensure there are no conflicts
with other features being developed in parallel.


hmmm

From my point of view ABI remains untouched.
The guest can send any amount of ;
pairs and unknown tags are properly ignored
by the host.

That is why I think that this change is safe.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] calculate 'available' memory in the separate function

2016-02-16 Thread Denis V. Lunev
From: Igor Redko <red...@virtuozzo.com>

Factor out calculation of the available memory counter into a separate
exportable function, in order to be able to use it in other parts of
the kernel.

In particular, it appears a relevant metric to report to the
hypervisor via virtio-balloon statistics interface (in a followup
patch).

Signed-off-by: Igor Redko <red...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Andrew Morton <a...@linux-foundation.org>
---
 fs/proc/meminfo.c  | 31 +--
 include/linux/mm.h |  1 +
 mm/page_alloc.c| 43 +++
 3 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index df4661a..8372046 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -29,10 +29,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
unsigned long committed;
long cached;
long available;
-   unsigned long pagecache;
-   unsigned long wmark_low = 0;
unsigned long pages[NR_LRU_LISTS];
-   struct zone *zone;
int lru;
 
 /*
@@ -51,33 +48,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
pages[lru] = global_page_state(NR_LRU_BASE + lru);
 
-   for_each_zone(zone)
-   wmark_low += zone->watermark[WMARK_LOW];
-
-   /*
-* Estimate the amount of memory available for userspace allocations,
-* without causing swapping.
-*/
-   available = i.freeram - totalreserve_pages;
-
-   /*
-* Not all the page cache can be freed, otherwise the system will
-* start swapping. Assume at least half of the page cache, or the
-* low watermark worth of cache, needs to stay.
-*/
-   pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
-   pagecache -= min(pagecache / 2, wmark_low);
-   available += pagecache;
-
-   /*
-* Part of the reclaimable slab consists of items that are in use,
-* and cannot be freed. Cap this estimate at the low watermark.
-*/
-   available += global_page_state(NR_SLAB_RECLAIMABLE) -
-min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low);
-
-   if (available < 0)
-   available = 0;
+   available = si_mem_available();
 
/*
 * Tagged format, for easy grepping and expansion.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 516e149..a8c4144 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1862,6 +1862,7 @@ extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
 extern void __init mmap_init(void);
 extern void show_mem(unsigned int flags);
+extern long si_mem_available(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb..dae813c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3603,6 +3603,49 @@ static inline void show_node(struct zone *zone)
printk("Node %d ", zone_to_nid(zone));
 }
 
+long si_mem_available(void)
+{
+   long available;
+   unsigned long pagecache;
+   unsigned long wmark_low = 0;
+   unsigned long pages[NR_LRU_LISTS];
+   struct zone *zone;
+   int lru;
+
+   for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
+   pages[lru] = global_page_state(NR_LRU_BASE + lru);
+
+   for_each_zone(zone)
+   wmark_low += zone->watermark[WMARK_LOW];
+
+   /*
+* Estimate the amount of memory available for userspace allocations,
+* without causing swapping.
+*/
+   available = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
+
+   /*
+* Not all the page cache can be freed, otherwise the system will
+* start swapping. Assume at least half of the page cache, or the
+* low watermark worth of cache, needs to stay.
+*/
+   pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
+   pagecache -= min(pagecache / 2, wmark_low);
+   available += pagecache;
+
+   /*
+* Part of the reclaimable slab consists of items that are in use,
+* and cannot be freed. Cap this estimate at the low watermark.
+*/
+   available += global_page_state(NR_SLAB_RECLAIMABLE) -
+min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low);
+
+   if (available < 0)
+   available = 0;
+   return available;
+}
+EXPORT_SYMBOL_GPL(si_mem_available);
+
 void si_meminfo(struct sysinfo *val)
 {
val->totalram = totalram_pages;
-- 
2.5.0

___
Virtualization mailing list
Virtuali

[PATCH 2/2] virtio_balloon: export 'available' memory to balloon statistics

2016-02-16 Thread Denis V. Lunev
From: Igor Redko <red...@virtuozzo.com>

Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap.

Signed-off-by: Igor Redko <red...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Andrew Morton <a...@linux-foundation.org>
---
 drivers/virtio/virtio_balloon.c | 6 ++
 include/uapi/linux/virtio_balloon.h | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0c3691f..f2b77de 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -229,10 +230,13 @@ static void update_balloon_stats(struct virtio_balloon 
*vb)
unsigned long events[NR_VM_EVENT_ITEMS];
struct sysinfo i;
int idx = 0;
+   long available;
 
all_vm_events(events);
si_meminfo();
 
+   available = si_mem_available();
+
update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_IN,
pages_to_bytes(events[PSWPIN]));
update_stat(vb, idx++, VIRTIO_BALLOON_S_SWAP_OUT,
@@ -243,6 +247,8 @@ static void update_balloon_stats(struct virtio_balloon *vb)
pages_to_bytes(i.freeram));
update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMTOT,
pages_to_bytes(i.totalram));
+   update_stat(vb, idx++, VIRTIO_BALLOON_S_AVAIL,
+   pages_to_bytes(available));
 }
 
 /*
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index d7f1cbc..343d7dd 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -51,7 +51,8 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_MINFLT   3   /* Number of minor faults */
 #define VIRTIO_BALLOON_S_MEMFREE  4   /* Total amount of free memory */
 #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
-#define VIRTIO_BALLOON_S_NR   6
+#define VIRTIO_BALLOON_S_AVAIL6   /* Available memory as in /proc */
+#define VIRTIO_BALLOON_S_NR   7
 
 /*
  * Memory statistics structure.
-- 
2.5.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] export 'available' memory to virtio balloon statistics

2016-02-16 Thread Denis V. Lunev
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap. This metric would be very
useful in VM orchestration software to improve memory management
of different VMs under overcommit.

Signed-off-by: Igor Redko <red...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Andrew Morton <a...@linux-foundation.org>

Igor Redko (2):
  calculate 'available' memory in the separate function
  virtio_balloon: export 'available' memory to balloon statistics

 drivers/virtio/virtio_balloon.c |  6 ++
 fs/proc/meminfo.c   | 31 +-
 include/linux/mm.h  |  1 +
 include/uapi/linux/virtio_balloon.h |  3 ++-
 mm/page_alloc.c | 43 +
 5 files changed, 53 insertions(+), 31 deletions(-)

-- 
2.5.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [kvm-unit-tests PATCH] x86: hyperv_synic: Hyper-V SynIC test

2015-11-02 Thread Denis V. Lunev

On 11/02/2015 03:16 PM, Paolo Bonzini wrote:

On 26/10/2015 10:56, Andrey Smetanin wrote:

Hyper-V SynIC is a Hyper-V synthetic interrupt controller.

The test runs on every vCPU and performs the following steps:
* read from all Hyper-V SynIC MSR's
* setup Hyper-V SynIC evt/msg pages
* setup SINT's routing
* inject SINT's into destination vCPU by 'hyperv-synic-test-device'
* wait for SINT's isr's completion
* clear Hyper-V SynIC evt/msg pages and destroy SINT's routing

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: qemu-de...@nongnu.org
CC: virtualization@lists.linux-foundation.org

Bad news.

The test breaks with APICv, because of the following sequence of events:

1) non-auto-EOI interrupt 176 is injected into IRR and ISR

2) The PPR register is now 176

3) auto-EOI interrupt 179 is injected into IRR only, because (179 &
0xf0) <= (PPR & 0xf0)

4) interrupt 176 ISR performs an EOI

5) at this point, because virtual interrupt delivery is enabled, the
processor does not perform TPR virtualization (SDM 29.1.2).

In addition (and even worse) because virtual interrupt delivery is
enabled, an auto-EOI interrupt that was stashed in IRR can be injected
by the processor, and the auto-EOI behavior will be skipped.

The solution is to have userspace enable KVM_CAP_HYPERV_SYNIC through
KVM_ENABLE_CAP, and modify vmx.c to not use apicv on VMs that have it
enabled.  This requires some changes to the callbacks that only work if
enable_apicv or !enable_apicv:

if (enable_apicv)
kvm_x86_ops->update_cr8_intercept = NULL;
else {
kvm_x86_ops->hwapic_irr_update = NULL;
kvm_x86_ops->hwapic_isr_update = NULL;
kvm_x86_ops->deliver_posted_interrupt = NULL;
kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}

The question then is... does Hyper-V actually use auto-EOI interrupts?
If it doesn't, we might as well not implement them... :/

I'm keeping the kernel patches queued for my own testing, but this of
course has to be fixed before including them---which will delay this
feature to 4.5, unfortunately.

Paolo


well, the problem is that it actually uses auto EOI

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH 3/7] linux-headers/kvm: add Hyper-V SynIC irq routing type and struct

2015-10-26 Thread Denis V. Lunev

On 10/26/2015 01:03 PM, Peter Maydell wrote:

On 26 October 2015 at 09:50, Andrey Smetanin <asmeta...@virtuozzo.com> wrote:

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: k...@vger.kernel.org
CC: virtualization@lists.linux-foundation.org

---
  linux-headers/linux/kvm.h | 8 
  1 file changed, 8 insertions(+)

Hi. Changes to linux-headers/ should only be made as part of
an automated update from a mainline Linux kernel tree using
the scripts/update-linux-headers.sh script. This patch looks
like maybe it was a manual edit ?

thanks
-- PMM


yep. We know and have discussed this with Paolo already.
Kernel stuff is in progress at the moment. The patch
is presented to interested people to allow to compile and
run.

Actual merge will be performed with proper sync
when kernel will be in rc3 or 4 stage and the patch will be
dropped.

The same applies for patch 5.

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 7/9] kvm/x86: split ioapic-handled and EOI exit bitmaps

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/ioapic.c   |  4 ++--
 arch/x86/kvm/ioapic.h   |  7 ---
 arch/x86/kvm/irq_comm.c |  6 +++---
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/svm.c  |  2 +-
 arch/x86/kvm/vmx.c  |  3 +--
 arch/x86/kvm/x86.c  | 12 +++-
 8 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53deb27..07f7cd7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -400,7 +400,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
-   u64 eoi_exit_bitmap[4];
+   DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
int mp_state;
@@ -833,7 +833,7 @@ struct kvm_x86_ops {
int (*cpu_uses_apicv)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
-   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
+   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2dcda0f..3cf7a9c 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -233,7 +233,7 @@ static void kvm_ioapic_inject_all(struct kvm_ioapic 
*ioapic, unsigned long irr)
 }
 
 
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong 
*ioapic_handled_vectors)
 {
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
union kvm_ioapic_redirect_entry *e;
@@ -248,7 +248,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
if (kvm_apic_match_dest(vcpu, NULL, 0,
e->fields.dest_id, e->fields.dest_mode))
__set_bit(e->fields.vector,
-   (unsigned long *)eoi_exit_bitmap);
+ ioapic_handled_vectors);
}
}
spin_unlock(>lock);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 084617d..2d16dc2 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -121,7 +121,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
+  ulong *ioapic_handled_vectors);
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors);
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 6f922c2..fe91f72 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -371,7 +371,8 @@ void kvm_arch_post_irq_routing_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
 }
 
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors)
 {
struct kvm *kvm = vcpu->kvm;
struct kvm_kernel_irq_routing_entry *entry;
@@ -398,8 +399,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
dest_mode)) {
u32 vect

[PATCH 2/9] kvm/eventfd: factor out kvm_notify_acked_gsi()

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Factor out kvm_notify_acked_gsi() helper to iterate over EOI listeners
and notify those matching the given gsi.

It will be reused in the upcoming Hyper-V SynIC implementation.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/eventfd.c   | 16 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9596a2f..b66861c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -829,6 +829,7 @@ int kvm_set_irq_inatomic(struct kvm *kvm, int 
irq_source_id, u32 irq, int level)
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level, bool line_status);
 bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
+void kvm_notify_acked_gsi(struct kvm *kvm, int gsi);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 518421e..f6b986a 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -451,9 +451,18 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
 }
 EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
 
-void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
+void kvm_notify_acked_gsi(struct kvm *kvm, int gsi)
 {
struct kvm_irq_ack_notifier *kian;
+
+   hlist_for_each_entry_rcu(kian, >irq_ack_notifier_list,
+link)
+   if (kian->gsi == gsi)
+   kian->irq_acked(kian);
+}
+
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
+{
int gsi, idx;
 
trace_kvm_ack_irq(irqchip, pin);
@@ -461,10 +470,7 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
idx = srcu_read_lock(>irq_srcu);
gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
if (gsi != -1)
-   hlist_for_each_entry_rcu(kian, >irq_ack_notifier_list,
-link)
-   if (kian->gsi == gsi)
-   kian->irq_acked(kian);
+   kvm_notify_acked_gsi(kvm, gsi);
srcu_read_unlock(>irq_srcu, idx);
 }
 
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/9] kvm/eventfd: add arch-specific set_irq

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Allow for arch-specific interrupt types to be set.  For that, add
kvm_arch_set_irq() which takes interrupt type-specific action if it
recognizes the interrupt type given, and -EWOULDBLOCK otherwise.

The default implementation always returns -EWOULDBLOCK.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 include/linux/kvm_host.h |  4 
 virt/kvm/eventfd.c   | 13 -
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b66861c..eba9cae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -828,6 +828,10 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level,
 int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int 
level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level, bool line_status);
+
+int kvm_arch_set_irq(struct kvm_kernel_irq_routing_entry *irq, struct kvm *kvm,
+int irq_source_id, int level, bool line_status);
+
 bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_notify_acked_gsi(struct kvm *kvm, int gsi);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f6b986a..e29fd26 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -171,6 +171,15 @@ irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
queue_work(irqfd_cleanup_wq, >shutdown);
 }
 
+int __attribute__((weak)) kvm_arch_set_irq(
+   struct kvm_kernel_irq_routing_entry *irq,
+   struct kvm *kvm, int irq_source_id,
+   int level,
+   bool line_status)
+{
+   return -EWOULDBLOCK;
+}
+
 /*
  * Called with wqh->lock held and interrupts disabled
  */
@@ -195,7 +204,9 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
if (irq.type == KVM_IRQ_ROUTING_MSI)
kvm_set_msi(, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
false);
-   else
+   else if (kvm_arch_set_irq(, kvm,
+ KVM_USERSPACE_IRQ_SOURCE_ID, 1,
+ false) == -EWOULDBLOCK)
schedule_work(>inject);
srcu_read_unlock(>irq_srcu, idx);
}
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/9] kvm/irqchip: allow only multiple irqchip routes per GSI

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Any other irq routing types (MSI, S390_ADAPTER, upcoming Hyper-V
SynIC) map one-to-one to GSI.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 virt/kvm/irqchip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 716a1c4..f0b08a2 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -144,11 +144,11 @@ static int setup_routing_entry(struct 
kvm_irq_routing_table *rt,
 
/*
 * Do not allow GSI to be mapped to the same irqchip more than once.
-* Allow only one to one mapping between GSI and MSI.
+* Allow only one to one mapping between GSI and non-irqchip routing.
 */
hlist_for_each_entry(ei, >map[ue->gsi], link)
-   if (ei->type == KVM_IRQ_ROUTING_MSI ||
-   ue->type == KVM_IRQ_ROUTING_MSI ||
+   if (ei->type != KVM_IRQ_ROUTING_IRQCHIP ||
+   ue->type != KVM_IRQ_ROUTING_IRQCHIP ||
ue->u.irqchip.irqchip == ei->irqchip.irqchip)
return r;
 
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 5/9] kvm/irqchip: kvm_arch_irq_routing_update renaming split

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/kvm/irq_comm.c  | 2 +-
 include/linux/kvm_host.h | 5 +++--
 virt/kvm/irqchip.c   | 7 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index c892289..6f922c2 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -364,7 +364,7 @@ int kvm_setup_empty_irq_routing(struct kvm *kvm)
return kvm_set_irq_routing(kvm, empty_routing, 0, 0);
 }
 
-void kvm_arch_irq_routing_update(struct kvm *kvm)
+void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
if (ioapic_in_kernel(kvm) || !irqchip_in_kernel(kvm))
return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eba9cae..c742e79 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -473,12 +473,12 @@ void vcpu_put(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_HAVE_IOAPIC
 void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
-void kvm_arch_irq_routing_update(struct kvm *kvm);
+void kvm_arch_post_irq_routing_update(struct kvm *kvm);
 #else
 static inline void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
 {
 }
-static inline void kvm_arch_irq_routing_update(struct kvm *kvm)
+static inline void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
@@ -1080,6 +1080,7 @@ static inline void kvm_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
+void kvm_arch_irq_routing_update(struct kvm *kvm);
 
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index f0b08a2..fe84e1a 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -166,6 +166,10 @@ out:
return r;
 }
 
+void __attribute__((weak)) kvm_arch_irq_routing_update(struct kvm *kvm)
+{
+}
+
 int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *ue,
unsigned nr,
@@ -219,9 +223,10 @@ int kvm_set_irq_routing(struct kvm *kvm,
old = kvm->irq_routing;
rcu_assign_pointer(kvm->irq_routing, new);
kvm_irq_routing_update(kvm);
+   kvm_arch_irq_routing_update(kvm);
mutex_unlock(>irq_lock);
 
-   kvm_arch_irq_routing_update(kvm);
+   kvm_arch_post_irq_routing_update(kvm);
 
synchronize_srcu_expedited(>irq_srcu);
 
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/9] kvm/eventfd: avoid loop inside irqfd_update()

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

The loop(for) inside irqfd_update() is unnecessary
because any other value for irq_entry.type will just trigger
schedule_work(>inject).

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 virt/kvm/eventfd.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index b637965..518421e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -238,20 +238,17 @@ static void irqfd_update(struct kvm *kvm, struct 
kvm_kernel_irqfd *irqfd)
 {
struct kvm_kernel_irq_routing_entry *e;
struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS];
-   int i, n_entries;
+   int n_entries;
 
n_entries = kvm_irq_map_gsi(kvm, entries, irqfd->gsi);
 
write_seqcount_begin(>irq_entry_sc);
 
-   irqfd->irq_entry.type = 0;
-
e = entries;
-   for (i = 0; i < n_entries; ++i, ++e) {
-   /* Only fast-path MSI. */
-   if (e->type == KVM_IRQ_ROUTING_MSI)
-   irqfd->irq_entry = *e;
-   }
+   if (n_entries == 1)
+   irqfd->irq_entry = *e;
+   else
+   irqfd->irq_entry.type = 0;
 
write_seqcount_end(>irq_entry_sc);
 }
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/9] Hyper-V synthetic interrupt controller

2015-10-16 Thread Denis V. Lunev
This patchset implements the KVM part of the synthetic interrupt
controller (SynIC) which is a building block of the Hyper-V
paravirtualized device bus (vmbus).

SynIC is a lapic extension, which is controlled via MSRs and maintains
for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Besides, a new vcpu exit is introduced to notify the userspace of the
changes in SynIC configuraion triggered by guest writing to the
corresponding MSRs.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>

Changes v2:
* irqchip/eventfd preparation improvements to support
  arch specific routing entries like Hyper-V SynIC ones.
* add Hyper-V SynIC vectors into EOI exit bitmap.
* do not use posted interrupts in case of Hyper-V SynIC
  AutoEOI vectors

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 8/9] kvm/x86: Hyper-V synthetic interrupt controller

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified
---
 arch/x86/include/asm/kvm_host.h |  14 ++
 arch/x86/kvm/hyperv.c   | 297 
 arch/x86/kvm/hyperv.h   |  21 +++
 arch/x86/kvm/irq_comm.c |  34 +
 arch/x86/kvm/lapic.c|  18 ++-
 arch/x86/kvm/lapic.h|   5 +
 arch/x86/kvm/x86.c  |  12 +-
 include/linux/kvm_host.h|   6 +
 include/uapi/linux/kvm.h|   8 ++
 9 files changed, 407 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 07f7cd7..dfdaf0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -374,10 +375,23 @@ struct kvm_mtrr {
struct list_head head;
 };
 
+/* Hyper-V synthetic interrupt controller (SynIC)*/
+struct kvm_vcpu_hv_synic {
+   u64 version;
+   u64 control;
+   u64 msg_page;
+   u64 evt_page;
+   atomic64_t sint[HV_SYNIC_SINT_COUNT];
+   atomic_t sint_to_gsi[HV_SYNIC_SINT_COUNT];
+   DECLARE_BITMAP(auto_eoi_bitmap, 256);
+   DECLARE_BITMAP(vec_bitmap, 256);
+};
+
 /* Hyper-V per vcpu emulation context */
 struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
+   struct kvm_vcpu_hv_synic synic;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 62cf8c9..8ff71f3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,13 +23,296 @@
 
 #include "x86.h"
 #include "lapic.h"
+#include "ioapic.h"
 #include "hyperv.h"
 
 #include 
+#include 
 #include 
 
 #include "trace.h"
 
+static inline u64 synic_read_sint(struct kvm_vcpu_hv_synic *synic, int sint)
+{
+   return atomic64_read(>sint[sint]);
+}
+
+static inline int synic_get_sint_vector(u64 sint_value)
+{
+   if (sint_value & HV_SYNIC_SINT_MASKED)
+   return -1;
+   return sint_value & HV_SYNIC_SINT_VECTOR_MASK;
+}
+
+static bool synic_has_vector_connected(struct kvm_vcpu_hv_synic *synic,
+ int vector)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(synic->sint); i++) {
+   if (synic_get_sint_vector(synic_read_sint(synic, i)) == vector)
+   return true;
+   }
+   return false;
+}
+
+static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
+int vector)
+{
+   int i;
+   u64 sint_value;
+
+   for (i = 0; i < ARRAY_SIZE(synic->sint); i++) {
+   sint_value = synic_read_sint(synic, i);
+   if (synic_get_sint_vector(sint_value) == vector &&
+   sint_value & HV_SYNIC_SINT_AUTO_EOI)
+   return true;
+   }
+   return false;
+}
+
+static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint, u64 data)
+{
+   int vector;
+
+   vector = data & HV_SYNIC_SINT_VECTOR_MASK;
+   if (vector < 16)
+   return 1;
+   /*
+* Guest may configure multiple SINTs to use the same vector, so
+* we maintain a bitmap of vectors handled by synic, and a
+* bitmap of vectors with auto-eoi behavior.  The bitmaps are
+* updated here, and atomically queried on fast paths.
+*/
+
+   atomic64_set(>sint[si

[PATCH 9/9] kvm/x86: Hyper-V kvm exit

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtiozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 Documentation/virtual/kvm/api.txt |  6 ++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/hyperv.c | 17 +
 arch/x86/kvm/x86.c|  6 ++
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  | 17 +
 6 files changed, 48 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 092ee9f..86cae88 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3331,6 +3331,12 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
 it is still asserted.  Vector is the LAPIC interrupt vector for which the
 EOI was received.
 
+   /* KVM_EXIT_HYPERV */
+struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU exits into userspace to process some tasks
+related with Hyper-V emulation. Currently used to synchronize modified
+Hyper-V SynIC state with userspace.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dfdaf0f..a41d7ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -392,6 +392,7 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
struct kvm_vcpu_hv_synic synic;
+   struct kvm_hyperv_exit exit;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8ff71f3..9443920 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -129,6 +129,20 @@ static void kvm_hv_notify_acked_sint(struct kvm_vcpu 
*vcpu, u32 sint)
srcu_read_unlock(>irq_srcu, idx);
 }
 
+static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct kvm_vcpu_hv *hv_vcpu = >arch.hyperv;
+
+   hv_vcpu->exit.type = KVM_EXIT_HYPERV_SYNIC;
+   hv_vcpu->exit.u.synic.msr = msr;
+   hv_vcpu->exit.u.synic.control = synic->control;
+   hv_vcpu->exit.u.synic.evt_page = synic->evt_page;
+   hv_vcpu->exit.u.synic.msg_page = synic->msg_page;
+
+   kvm_make_request(KVM_REQ_HV_EXIT, vcpu);
+}
+
 static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
 u32 msr, u64 data, bool host)
 {
@@ -141,6 +155,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
switch (msr) {
case HV_X64_MSR_SCONTROL:
synic->control = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SVERSION:
if (!host) {
@@ -157,6 +172,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->evt_page = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SIMP:
if (data & HV_SYNIC_SIMP_ENABLE)
@@ -166,6 +182,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->msg_page = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_EOM: {
int i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 807d124..9453207 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6342,6 +6342,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
}
+   if (kvm_check_request(KVM_REQ_HV_EXIT, vcpu)) {
+   vcpu->run->exit_reason = KVM_EXIT_HYPERV;
+   vcpu->run->hyperv = vcpu->arch.hyperv.exit;
+   r = 0;
+   goto out;
+   }
}
 
/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 43b0141..e38ac16 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -143,6 +143,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_HV_CRASH  27
 #define KVM_REQ_IOAPIC_EOI_EXIT   28
 #define KVM_REQ_HV_RESET  29
+#define KVM_REQ_HV_EXIT   30
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 

[PATCH 6/9] drivers/hv: share Hyper-V SynIC constants with userspace

2015-10-16 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Moved Hyper-V synic contants from guest Hyper-V drivers private
header into x86 arch uapi Hyper-V header.

Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/include/uapi/asm/hyperv.h | 12 
 drivers/hv/hyperv_vmbus.h  |  5 -
 include/linux/hyperv.h |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index 2677a0a..040d408 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -257,4 +257,16 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
__s64 tsc_offset;
 } HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 
+/* Define the number of synthetic interrupt sources. */
+#define HV_SYNIC_SINT_COUNT(16)
+/* Define the expected SynIC version. */
+#define HV_SYNIC_VERSION_1 (0x1)
+
+#define HV_SYNIC_CONTROL_ENABLE(1ULL << 0)
+#define HV_SYNIC_SIMP_ENABLE   (1ULL << 0)
+#define HV_SYNIC_SIEFP_ENABLE  (1ULL << 0)
+#define HV_SYNIC_SINT_MASKED   (1ULL << 16)
+#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
+#define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
+
 #endif
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 3d70e36..3782636 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -63,9 +63,6 @@ enum hv_cpuid_function {
 /* Define version of the synthetic interrupt controller. */
 #define HV_SYNIC_VERSION   (1)
 
-/* Define the expected SynIC version. */
-#define HV_SYNIC_VERSION_1 (0x1)
-
 /* Define synthetic interrupt controller message constants. */
 #define HV_MESSAGE_SIZE(256)
 #define HV_MESSAGE_PAYLOAD_BYTE_COUNT  (240)
@@ -105,8 +102,6 @@ enum hv_message_type {
HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
 };
 
-/* Define the number of synthetic interrupt sources. */
-#define HV_SYNIC_SINT_COUNT(16)
 #define HV_SYNIC_STIMER_COUNT  (4)
 
 /* Define invalid partition identifier. */
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 54733d5..8fdc17b 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -26,6 +26,7 @@
 #define _HYPERV_H
 
 #include 
+#include 
 
 #include 
 #include 
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH 2/2] kvm/x86: Hyper-V kvm exit

2015-10-12 Thread Denis V. Lunev

On 10/12/2015 04:42 PM, Eric Blake wrote:

On 10/09/2015 07:39 AM, Denis V. Lunev wrote:

From: Andrey Smetanin <asmeta...@virtuozzo.com>

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V synic configuraion triggered by guest writing to the

s/configuraion/configuration/
Is 'synic' intended?  Is it short for something (if so, spelling it out
may help)?



+++ b/Documentation/virtual/kvm/api.txt
@@ -3331,6 +3331,12 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
  it is still asserted.  Vector is the LAPIC interrupt vector for which the
  EOI was received.
  
+		/* KVM_EXIT_HYPERV */

+struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU's exits into userspace to process some tasks

s/VCPU's/VCPU/


+related with Hyper-V emulation. Currently used to synchronize modified
+Hyper-V synic state with userspace.

Again, is 'synic' intended?  Hmm, I see it throughout the patch, so it
looks intentional, but I keep trying to read it as a typo for 'sync'.


this is not a typo :)

this is an abbreviation for synthetic interrupt controller,
pls compare with this

./arch/x86/include/uapi/asm/hyperv.h: * Basic SynIC MSRs 
(HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM


Though there is some sense in the question itself. I think
that it would be better to keep naming it SynIC as
in the original kernel code.

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] kvm/x86: Hyper-V synthetic interrupt controller

2015-10-09 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Synic is a lapic extension, which is controlled via MSRs and maintains
for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/powerpc/kvm/mpic.c |  18 +++
 arch/s390/kvm/interrupt.c   |  18 +++
 arch/x86/include/asm/kvm_host.h |  14 +++
 arch/x86/kvm/hyperv.c   | 266 
 arch/x86/kvm/hyperv.h   |  20 +++
 arch/x86/kvm/irq_comm.c |  16 +++
 arch/x86/kvm/lapic.c|  15 ++-
 arch/x86/kvm/lapic.h|   5 +
 arch/x86/kvm/x86.c  |   4 +
 drivers/hv/hyperv_vmbus.h   |   5 -
 include/linux/kvm_host.h|  12 ++
 include/uapi/linux/hyperv.h |  12 ++
 include/uapi/linux/kvm.h|   8 ++
 virt/kvm/arm/vgic.c |  18 +++
 virt/kvm/eventfd.c  |  35 +-
 virt/kvm/irqchip.c  |  24 +++-
 16 files changed, 475 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 6249cdc..01e7fb4 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1850,3 +1850,21 @@ int kvm_set_routing_entry(struct 
kvm_kernel_irq_routing_entry *e,
 out:
return r;
 }
+
+/* Hyper-V Synic not implemented */
+int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
+   struct kvm *kvm, int irq_source_id, int level,
+   bool line_status)
+{
+   return -ENOTSUP;
+}
+
+int kvm_hv_get_sint_gsi(struct kvm_vcpu *vcpu, u32 sint)
+{
+   return -ENOTSUP;
+}
+
+int kvm_hv_set_sint_gsi(struct kvm *kvm, u32 vcpu_id, u32 sint, int gsi)
+{
+   return -ENOTSUP;
+}
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 5c2c169..7fa8d9d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2285,3 +2285,21 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 
__user *buf, int len)
 
return n;
 }
+
+/* Hyper-V Synic not implemented */
+int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
+   struct kvm *kvm, int irq_source_id, int level,
+   bool line_status)
+{
+   return -ENOTSUP;
+}
+
+int kvm_hv_get_sint_gsi(struct kvm_vcpu *vcpu, u32 sint)
+{
+   return -ENOTSUP;
+}
+
+int kvm_hv_set_sint_gsi(struct kvm *kvm, u32 vcpu_id, u32 sint, int gsi)
+{
+   return -ENOTSUP;
+}
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cdbdb55..e614a543 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -374,10 +375,23 @@ struct kvm_mtrr {
struct list_head head;
 };
 
+/* Hyper-V synthetic interrupt controller */
+struct kvm_vcpu_hv_synic {
+   u64 version;
+   u64 control;
+   u64 msg_page;
+   u64 evt_page;
+   atomic64_t sint[HV_SYNIC_SINT_COUNT];
+   atomic_t sint_to_gsi[HV_SYNIC_SINT_COUNT];
+   DECLARE_BITMAP(auto_eoi_bitmap, 256);
+   DECLARE_BITMAP(vec_bitmap, 256);
+};
+
 /* Hyper-V per vcpu emulation context */
 struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
+   struct kvm_vcpu_hv_synic synic;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 62cf8c9..15c3c02 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,13 +23,265 @@
 
 #include "x86.h"
 #include "lapic.h"
+#include "ioapic.h"
 #include "hyperv.h"
 
 #include 
+#include 
 #include 
 
 #include "trace.h"
 
+static inline u64 synic_read_sint(struct kvm_vcpu_hv_synic *synic, int sint)
+{
+   return atomic64_read(>sint[sint]);
+}
+
+static inline int synic_get_sint_vector(u64 sint_value)
+{
+   if (sint_value &a

[PATCH 2/2] kvm/x86: Hyper-V kvm exit

2015-10-09 Thread Denis V. Lunev
From: Andrey Smetanin <asmeta...@virtuozzo.com>

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V synic configuraion triggered by guest writing to the
corresponding MSRs.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtiozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
---
 Documentation/virtual/kvm/api.txt |  6 ++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/hyperv.c | 17 +
 arch/x86/kvm/x86.c|  6 ++
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  | 17 +
 6 files changed, 48 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 34cc068..cffe670 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3331,6 +3331,12 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
 it is still asserted.  Vector is the LAPIC interrupt vector for which the
 EOI was received.
 
+   /* KVM_EXIT_HYPERV */
+struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU's exits into userspace to process some tasks
+related with Hyper-V emulation. Currently used to synchronize modified
+Hyper-V synic state with userspace.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e614a543..f515e01 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -392,6 +392,7 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
struct kvm_vcpu_hv_synic synic;
+   struct kvm_hyperv_exit exit;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 15c3c02..174ce041 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -91,6 +91,20 @@ static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, 
int sint, u64 data)
return 0;
 }
 
+static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct kvm_vcpu_hv *hv_vcpu = >arch.hyperv;
+
+   hv_vcpu->exit.type = KVM_EXIT_HYPERV_SYNIC;
+   hv_vcpu->exit.u.synic.msr = msr;
+   hv_vcpu->exit.u.synic.control = synic->control;
+   hv_vcpu->exit.u.synic.evt_page = synic->evt_page;
+   hv_vcpu->exit.u.synic.msg_page = synic->msg_page;
+
+   kvm_make_request(KVM_REQ_HV_EXIT, vcpu);
+}
+
 static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
 u32 msr, u64 data, bool host)
 {
@@ -103,6 +117,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
switch (msr) {
case HV_X64_MSR_SCONTROL:
synic->control = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SVERSION:
if (!host) {
@@ -119,6 +134,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->evt_page = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SIMP:
if (data & HV_SYNIC_SIMP_ENABLE)
@@ -128,6 +144,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->msg_page = data;
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_EOM: {
int i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7580e9c..4c80d18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6335,6 +6335,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
}
+   if (kvm_check_request(KVM_REQ_HV_EXIT, vcpu)) {
+   vcpu->run->exit_reason = KVM_EXIT_HYPERV;
+   vcpu->run->hyperv = vcpu->arch.hyperv.exit;
+   r = 0;
+   goto out;
+   }
}
 
/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 30fac73..d80b031 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -143,6 +143,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_HV_CRASH  27
 #define KVM_REQ_IOAPIC_EOI_EXIT   28
 #define KVM_REQ_HV_RESET  29
+#define KVM_REQ_HV_EXIT   30
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID   1
diff --git a

[PATCH 0/2] Hyper-V synthetic interrupt controller

2015-10-09 Thread Denis V. Lunev
This patchset implements the KVM part of the synthetic interrupt
controller (synic) which is a building block of the Hyper-V
paravirtualized device bus (vmbus).

Synic is a lapic extension, which is controlled via MSRs and maintains
for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Besides, a new vcpu exit is introduced to notify the userspace of the
changes in synic configuraion triggered by guest writing to the
corresponding MSRs.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
CC: "K. Y. Srinivasan" <k...@microsoft.com>
CC: Gleb Natapov <g...@kernel.org>
CC: Paolo Bonzini <pbonz...@redhat.com>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] virtio_balloon: do not change memory amount visible via /proc/meminfo

2015-08-31 Thread Denis V. Lunev

On 08/20/2015 12:49 AM, Denis V. Lunev wrote:

Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.

Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.

Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

Please note that neither VMWare ballon nor HyperV balloon do not care
about proper handling of adjust_managed_page_count at all.

Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Michael S. Tsirkin <m...@redhat.com>

ping

Michael, the issue is important for us. Can you pls look?

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: do not change memory amount visible via /proc/meminfo

2015-08-19 Thread Denis V. Lunev
Balloon device is frequently used as a mean of cooperative memory control
in between guest and host to manage memory overcommitment. This is the
typical case for any hosting workload when KVM guest is provided for
end-user.

Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.

Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.

Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8543c9a..7efc329 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -157,7 +157,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
}
set_page_pfns(vb-pfns + vb-num_pfns, page);
vb-num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
-   adjust_managed_page_count(page, -1);
+   if (!virtio_has_feature(vb-vdev,
+   VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   adjust_managed_page_count(page, -1);
}
 
/* Did we get any? */
@@ -173,7 +175,9 @@ static void release_pages_balloon(struct virtio_balloon *vb)
/* Find pfns pointing at start of each page, get pages and free them. */
for (i = 0; i  vb-num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
struct page *page = balloon_pfn_to_page(vb-pfns[i]);
-   adjust_managed_page_count(page, 1);
+   if (!virtio_has_feature(vb-vdev,
+   VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   adjust_managed_page_count(page, 1);
put_page(page); /* balloon reference */
}
 }
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] virtio_balloon: do not change memory amount visible via /proc/meminfo

2015-08-19 Thread Denis V. Lunev
Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.

Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.

Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.

Please note that neither VMWare ballon nor HyperV balloon do not care
about proper handling of adjust_managed_page_count at all.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Michael S. Tsirkin m...@redhat.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio_ballon: change stub of release_pages_by_pfn

2015-08-19 Thread Denis V. Lunev
and rename it to release_pages_balloon. The function originally takes
arrays of pfns and now it takes pointer to struct virtio_ballon.
This change is necessary to conditionally call adjust_managed_page_count
in the next patch.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 82e80e0..8543c9a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -166,13 +166,13 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
mutex_unlock(vb-balloon_lock);
 }
 
-static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
+static void release_pages_balloon(struct virtio_balloon *vb)
 {
unsigned int i;
 
/* Find pfns pointing at start of each page, get pages and free them. */
-   for (i = 0; i  num; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = balloon_pfn_to_page(pfns[i]);
+   for (i = 0; i  vb-num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   struct page *page = balloon_pfn_to_page(vb-pfns[i]);
adjust_managed_page_count(page, 1);
put_page(page); /* balloon reference */
}
@@ -206,7 +206,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
if (vb-num_pfns != 0)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
-   release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   release_pages_balloon(vb);
return num_freed_pages;
 }
 
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 1/1] mshyperv: fix recognition of Hyper-V guest crash MSR's

2015-07-02 Thread Denis V. Lunev
From: Andrey Smetanin asmeta...@virtuozzo.com

Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid
(0x4003) EDX's 10th bit should be used to check that Hyper-V guest
crash MSR's functionality available.

This patch should fix this recognition. Currently the code checks EAX
register instead of EDX.

Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Nick Meier nme...@microsoft.com
CC: K. Y. Srinivasan k...@microsoft.com
CC: Haiyang Zhang haiya...@microsoft.com
---
 arch/x86/include/asm/mshyperv.h | 1 +
 arch/x86/kernel/cpu/mshyperv.c  | 1 +
 drivers/hv/vmbus_drv.c  | 4 ++--
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index c163215..eebe433 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -7,6 +7,7 @@
 
 struct ms_hyperv_info {
u32 features;
+   u32 misc_features;
u32 hints;
 };
 
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index aad4bd8..6d172a2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -114,6 +114,7 @@ static void __init ms_hyperv_init_platform(void)
 * Extract the features and hints
 */
ms_hyperv.features = cpuid_eax(HYPERV_CPUID_FEATURES);
+   ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
ms_hyperv.hints= cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
 
printk(KERN_INFO HyperV: features 0x%x, hints 0x%x\n,
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index cf20400..67af13a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -841,7 +841,7 @@ static int vmbus_bus_init(int irq)
/*
 * Only register if the crash MSRs are available
 */
-   if (ms_hyperv.features  HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
+   if (ms_hyperv.misc_features  HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
atomic_notifier_chain_register(panic_notifier_list,
   hyperv_panic_block);
}
@@ -1110,7 +1110,7 @@ static void __exit vmbus_exit(void)
hv_remove_vmbus_irq();
tasklet_kill(msg_dpc);
vmbus_free_channels();
-   if (ms_hyperv.features  HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
+   if (ms_hyperv.misc_features  HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
atomic_notifier_chain_unregister(panic_notifier_list,
 hyperv_panic_block);
}
-- 
2.1.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-15 Thread Denis V. Lunev

On 14/10/14 13:10, Michael S. Tsirkin wrote:

On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote:

Michael S. Tsirkin m...@redhat.com writes:


On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:

Denis V. Lunev d...@parallels.com writes:

From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.


This makes some amount of sense.


This reminds me of the balloon fs that Google once proposed.
This really needs to be controlled from host though.
At the moment host does not expect guest to deflate before
requests.
So as a minimum, add a feature bit for this.  what if you want a mix of
mandatory and optional balooning? I guess we can use multiple balloons,
is that the idea?


Trying to claw back some pages on OOM is almost certainly correct,
even if the host doesn't expect it.  It's roughly equivalent to not
giving up pages in the first place.


Well the difference is that there are management tools that
poll balloon in host until they see balloon size reaches
the expected value.

They don't expect balloon to shrink below num_pages and will respond in various
unexpected ways like e.g. killing the VM if it does.
Killing a userspace process within the guest might be better
for VM health.

Besides the fact that we always did it like this, these tools seem to have
basis in the spec.
Specifically, this is based on this text from the spec:
the device asks for a certain amount of memory, and the driver
supplies it (or withdraws it, if the device has more than it asks for).
This allows the guest to adapt to changes in allowance of underlying
physical memory.

and

The device is driven by the receipt of a configuration change interrupt.




Cheers,
Rusty.
PS.  Yes, a real guest-driven balloon is preferable, but that's a much
  larger task.



Any objection to making the feature depend on a feature flag?




OK. I got the point. This sounds good for me. We will prepare patch for
kernel and proper bits (command line option) for QEMU.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-15 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f893148..66cac10 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/2] shrink virtio baloon on OOM in guest

2014-10-15 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when
Linux is under severe memory pressure. Various mechanisms are responsible for
correct virtio_balloon memory management. Nevertheless it is often the case
that these control tools does not have enough time to react on fast changing
memory load. As a result OS runs out of memory and invokes OOM-killer.
The balancing of memory by use of the virtio balloon should not cause the
termination of processes while there are pages in the balloon. Now there is
no way for virtio balloon driver to free memory at the last moment before
some process get killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is expected
to be called from the oom notifier call chain in out_of_memory() function.
If virtio balloon could release some memory, it will make the system to
return and retry the allocation that forced the out of memory killer to run.

Patch 1 of this series adds support for implementation of virtio_balloon
callback, so now leak_balloon() function returns number of freed pages.
Patch 2 implements virtio_balloon callback itself.

Changes from v2:
- added feature bit to control OOM baloon behavior from host
Changes from v1:
- minor cosmetic tweaks suggested by rusty@

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM

2014-10-15 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Allocate virtio  feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 52 +
 include/uapi/linux/virtio_balloon.h |  1 +
 2 files changed, 53 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 66cac10..88d73a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ * memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the
+ * system return and retry the allocation that forced the OOM killer
+ * to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   struct virtio_balloon *vb;
+   unsigned long *freed;
+   unsigned num_freed_pages;
+
+   vb = container_of(self, struct virtio_balloon, nb);
+   if (!virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   return NOTIFY_OK;
+
+   freed = parm;
+   num_freed_pages = leak_balloon(vb, oom_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -446,6 +488,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -455,6 +503,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb:
kfree(vb);
@@ -479,6 +529,7 @@ static void virtballoon_remove(struct virtio_device *vdev

[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-14 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f893148..66cac10 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM

2014-10-14 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 66cac10..ab1fa69 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,35 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ *  memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the
+ * system return and retry the allocation that forced the OOM killer
+ * to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   unsigned num_freed_pages;
+   unsigned long *freed;
+   struct virtio_balloon *vb;
+
+   freed = parm;
+   vb = container_of(self, struct virtio_balloon, nb);
+   num_freed_pages = leak_balloon(vb, oom_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -446,6 +485,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -455,6 +500,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb:
kfree(vb);
@@ -479,6 +526,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev-priv;
 
+   unregister_oom_notifier(vb-nb);
kthread_stop(vb-thread);
remove_common(vb);
kfree(vb);
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https

Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-13 Thread Denis V. Lunev

On 13/10/14 09:32, Rusty Russell wrote:

Denis V. Lunev d...@parallels.com writes:

From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This makes some amount of sense.

But I suggest a few minor changes:


+static int oom_vballoon_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_vballoon_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_vballoon_pages, pages to free on OOM);

Since this is already prefixed with virtio_balloon. I suggest just
calling it oom_pages.

ok, will do


+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   unsigned int num_freed_pages;
+   unsigned long *freed = (unsigned long *)parm;
+   struct virtio_balloon *vb = container_of((struct notifier_block *)self,
+struct virtio_balloon, nb);

Why cast self here?
this is a piece from a previous version of the patch, I'll fix this and 
resend.



+   num_freed_pages = leak_balloon(vb, oom_vballoon_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}

Cheers,
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-08 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org
---
 drivers/virtio/virtio_balloon.c | 46 +
 1 file changed, 46 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 213da41..ca77831 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_vballoon_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_vballoon_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_vballoon_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,33 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ *  memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the system
+ * return and retry the allocation that forced the OOM killer to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   unsigned int num_freed_pages;
+   unsigned long *freed = (unsigned long *)parm;
+   struct virtio_balloon *vb = container_of((struct notifier_block *)self,
+struct virtio_balloon, nb);
+
+   num_freed_pages = leak_balloon(vb, oom_vballoon_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -474,6 +511,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb_mapping;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -483,6 +526,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb_mapping:
balloon_mapping_free(vb_mapping);
@@ -511,6 +556,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev-priv;
 
+   unregister_oom_notifier(vb-nb);
kthread_stop(vb-thread

[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-08 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Accessing to vb-num_pfns outside of vb-balloon_lock is wrong and unsafe.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 25ebe8e..213da41 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] shrink virtio baloon on OOM in guest

2014-10-08 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when
Linux is under severe memory pressure. Various mechanisms are responsible for
correct virtio_balloon memory management. Nevertheless it is often the case
that these control tools does not have enough time to react on fast changing
memory load. As a result OS runs out of memory and invokes OOM-killer.
The balancing of memory by use of the virtio balloon should not cause the
termination of processes while there are pages in the balloon. Now there is
no way for virtio balloon driver to free memory at the last moment before
some process get killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is expected
to be called from the oom notifier call chain in out_of_memory() function.
If virtio balloon could release some memory, it will make the system to
return and retry the allocation that forced the out of memory killer to run.

Patch 1 of this series adds support for implementation of virtio_balloon
callback, so now leak_balloon() function returns number of freed pages.
Patch 2 implements virtio_balloon callback itself.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
CC: virtualization@lists.linux-foundation.org

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization