from:"kys"

[PATCH 0/7] Drivers: hv: Some miscellaneous fixes

2016-09-02 Thread kys

From: K. Y. Srinivasan 

Some miscellaneous fixes and enhancements. These patches were all
sent earlier but failed to apply clean on Greg's tree. These have
now been rebased.

Alex Ng (2):
  Drivers: hv: utils: Continue to poll VSS channel after handling
requests.
  Drivers: hv: utils: Check VSS daemon is listening before a hot backup

K. Y. Srinivasan (1):
  Drivers: hv: Introduce a policy for controlling channel affinity

Vitaly Kuznetsov (4):
  Drivers: hv: cleanup vmbus_open() for wrap around mappings
  Drivers: hv: ring_buffer: wrap around mappings for ring buffers
  Drivers: hv: ring_buffer: use wrap around mappings in
hv_copy{from,to}_ringbuffer()
  Drivers: hv: ring_buffer: count on wrap around mappings in
get_next_pkt_raw()

 drivers/hv/channel.c  |   66 ---
 drivers/hv/channel_mgmt.c |   68 +++--
 drivers/hv/hv_snapshot.c  |   93 ++---
 drivers/hv/hyperv_vmbus.h |4 +-
 drivers/hv/ring_buffer.c  |   61 +
 include/linux/hyperv.h|   55 --
 tools/hv/hv_vss_daemon.c  |3 +
 7 files changed, 193 insertions(+), 157 deletions(-)

-- 
1.7.4.1

[PATCH 0/7] Drivers: hv: Some miscellaneous fixes

2016-09-02 Thread kys

From: K. Y. Srinivasan 

Some miscellaneous fixes and enhancements. These patches were all
sent earlier but failed to apply clean on Greg's tree. These have
now been rebased.

Alex Ng (2):
  Drivers: hv: utils: Continue to poll VSS channel after handling
requests.
  Drivers: hv: utils: Check VSS daemon is listening before a hot backup

K. Y. Srinivasan (1):
  Drivers: hv: Introduce a policy for controlling channel affinity

Vitaly Kuznetsov (4):
  Drivers: hv: cleanup vmbus_open() for wrap around mappings
  Drivers: hv: ring_buffer: wrap around mappings for ring buffers
  Drivers: hv: ring_buffer: use wrap around mappings in
hv_copy{from,to}_ringbuffer()
  Drivers: hv: ring_buffer: count on wrap around mappings in
get_next_pkt_raw()

 drivers/hv/channel.c  |   66 ---
 drivers/hv/channel_mgmt.c |   68 +++--
 drivers/hv/hv_snapshot.c  |   93 ++---
 drivers/hv/hyperv_vmbus.h |4 +-
 drivers/hv/ring_buffer.c  |   61 +
 include/linux/hyperv.h|   55 --
 tools/hv/hv_vss_daemon.c  |3 +
 7 files changed, 193 insertions(+), 157 deletions(-)

-- 
1.7.4.1

[PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report

2016-08-24 Thread kys

From: Alex Ng 

Reports for available memory should use the si_mem_available() value.
The previous freeram value does not include available page cache memory.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   13 +
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index d55e0e7..fdf8da9 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1075,7 +1075,6 @@ static unsigned long compute_balloon_floor(void)
 static void post_status(struct hv_dynmem_device *dm)
 {
struct dm_status status;
-   struct sysinfo val;
unsigned long now = jiffies;
unsigned long last_post = last_post_time;
 
@@ -1087,7 +1086,6 @@ static void post_status(struct hv_dynmem_device *dm)
if (!time_after(now, (last_post_time + HZ)))
return;
 
-   si_meminfo();
memset(, 0, sizeof(struct dm_status));
status.hdr.type = DM_STATUS_REPORT;
status.hdr.size = sizeof(struct dm_status);
@@ -1103,7 +1101,7 @@ static void post_status(struct hv_dynmem_device *dm)
 * num_pages_onlined) as committed to the host, otherwise it can try
 * asking us to balloon them out.
 */
-   status.num_avail = val.freeram;
+   status.num_avail = si_mem_available();
status.num_committed = vm_memory_committed() +
dm->num_pages_ballooned +
(dm->num_pages_added > dm->num_pages_onlined ?
@@ -1209,7 +1207,7 @@ static void balloon_up(struct work_struct *dummy)
int ret;
bool done = false;
int i;
-   struct sysinfo val;
+   long avail_pages;
unsigned long floor;
 
/* The host balloons pages in 2M granularity. */
@@ -1221,12 +1219,12 @@ static void balloon_up(struct work_struct *dummy)
 */
alloc_unit = 512;
 
-   si_meminfo();
+   avail_pages = si_mem_available();
floor = compute_balloon_floor();
 
/* Refuse to balloon below the floor, keep the 2M granularity. */
-   if (val.freeram < num_pages || val.freeram - num_pages < floor) {
-   num_pages = val.freeram > floor ? (val.freeram - floor) : 0;
+   if (avail_pages < num_pages || avail_pages - num_pages < floor) {
+   num_pages = avail_pages > floor ? (avail_pages - floor) : 0;
num_pages -= num_pages % PAGES_IN_2M;
}
 
@@ -1237,7 +1235,6 @@ static void balloon_up(struct work_struct *dummy)
bl_resp->hdr.size = sizeof(struct dm_balloon_response);
bl_resp->more_pages = 1;
 
-
num_pages -= num_ballooned;
num_ballooned = alloc_balloon_pages(_device, num_pages,
bl_resp, alloc_unit);
-- 
1.7.4.1

[PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x16 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x19 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x19 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |  131 +-
 1 files changed, 94 insertions(+), 37 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4ae26d6..18766f6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -441,6 +441,16 @@ struct hv_hotadd_state {
unsigned long covered_end_pfn;
unsigned long ha_end_pfn;
unsigned long end_pfn;
+   /*
+* A list of gaps.
+*/
+   struct list_head gap_list;
+};
+
+struct hv_hotadd_gap {
+   struct list_head list;
+   unsigned long start_pfn;
+   unsigned long end_pfn;
 };
 
 struct balloon_state {
@@ -596,18 +606,46 @@ static struct notifier_block hv_memory_nb = {
.priority = 0
 };
 
+/* Check if the particular page is backed and can be onlined and online it. */
+static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg)
+{
+   unsigned long cur_start_pgp;
+   unsigned long cur_end_pgp;
+   struct hv_hotadd_gap *gap;
+
+   cur_start_pgp = (unsigned long)pfn_to_page(has->covered_start_pfn);
+   cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
-static void hv_bring_pgs_online(unsigned long start_pfn, unsigned long size)
+   /* The page is not backed. */
+   if (((unsigned long)pg < cur_start_pgp) ||
+   ((unsigned long)pg >= cur_end_pgp))
+   return;
+
+   /* Check for gaps. */
+   list_for_each_entry(gap, >gap_list, list) {
+   cur_start_pgp = (unsigned long)
+   pfn_to_page(gap->start_pfn);
+   cur_end_pgp = (unsigned long)
+   pfn_to_page(gap->end_pfn);
+   if (((unsigned long)pg >= cur_start_pgp) &&
+   ((unsigned long)pg < cur_end_pgp)) {
+   return;
+   }
+   }
+
+   /* This frame is currently backed; online the page. */
+   __online_page_set_limits(pg);
+   __online_page_increment_counters(pg);
+   __online_page_free(pg);
+}
+
+static void hv_bring_pgs_online(struct hv_hotadd_state *has,
+   unsigned long start_pfn, unsigned long size)
 {
int i;
 
-   for (i = 0; i < size; i++) {
-   struct page *pg;
-   pg = pfn_to_page(start_pfn + i);
-   __online_page_set_limits(pg);
-   __online_page_increment_counters(pg);
-   __online_page_free(pg);
-   }
+   for (i = 0; i < size; i++)
+   hv_page_online_one(has, pfn_to_page(start_pfn + i));
 }
 
 static void hv_mem_hot_add(unsigned long start, unsigned long size,
@@ -684,26 +722,24 @@ static void hv_online_page(struct page *pg)
list_for_each(cur, _device.ha_region_list) {
has = list_entry(cur, struct hv_hotadd_state, list);
cur_start_pgp = (unsigned long)
-   pfn_to_page(has->covered_start_pfn);
-   cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
+   pfn_to_page(has->start_pfn);
+   cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
 
-   if (((unsigned long)pg >= cur_start_pgp) &&
-   ((unsigned long)pg < cur_end_pgp)) {
-   /*
-* This frame is currently backed; online the
-* page.
-*/
-   __online_page_set_limits(pg);
-

[PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

lockdep reports possible circular locking dependency when udev is used
for memory onlining:

 systemd-udevd/3996 is trying to acquire lock:
  ((memory_chain).rwsem){.+}, at: [] 
__blocking_notifier_call_chain+0x4e/0xc0

 but task is already holding lock:
  (_device.ha_region_mutex){+.+.+.}, at: [] 
hv_memory_notifier+0x5e/0xc0 [hv_balloon]
 ...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   98 +-
 1 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3441326..d55e0e7 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,7 +547,11 @@ struct hv_dynmem_device {
 */
struct task_struct *thread;
 
-   struct mutex ha_region_mutex;
+   /*
+* Protects ha_region_list, num_pages_onlined counter and individual
+* regions from ha_region_list.
+*/
+   spinlock_t ha_lock;
 
/*
 * A list of hot-add regions.
@@ -571,18 +575,14 @@ static int hv_memory_notifier(struct notifier_block *nb, 
unsigned long val,
  void *v)
 {
struct memory_notify *mem = (struct memory_notify *)v;
+   unsigned long flags;
 
switch (val) {
-   case MEM_GOING_ONLINE:
-   mutex_lock(_device.ha_region_mutex);
-   break;
-
case MEM_ONLINE:
+   spin_lock_irqsave(_device.ha_lock, flags);
dm_device.num_pages_onlined += mem->nr_pages;
+   spin_unlock_irqrestore(_device.ha_lock, flags);
case MEM_CANCEL_ONLINE:
-   if (val == MEM_ONLINE ||
-   mutex_is_locked(_device.ha_region_mutex))
-   mutex_unlock(_device.ha_region_mutex);
if (dm_device.ha_waiting) {
dm_device.ha_waiting = false;
complete(_device.ol_waitevent);
@@ -590,10 +590,11 @@ static int hv_memory_notifier(struct notifier_block *nb, 
unsigned long val,
break;
 
case MEM_OFFLINE:
-   mutex_lock(_device.ha_region_mutex);
+   spin_lock_irqsave(_device.ha_lock, flags);
dm_device.num_pages_onlined -= mem->nr_pages;
-   mutex_unlock(_device.ha_region_mutex);
+   spin_unlock_irqrestore(_device.ha_lock, flags);
break;
+   case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
case MEM_CANCEL_OFFLINE:
break;
@@ -657,9 +658,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
unsigned long start_pfn;
unsigned long processed_pfn;
unsigned long total_pfn = pfn_count;
+   unsigned long flags;
 
for (i = 0; i < (size/HA_CHUNK); i++) {
start_pfn = start + (i * HA_CHUNK);
+
+   spin_lock_irqsave(_device.ha_lock, flags);
has->ha_end_pfn +=  HA_CHUNK;
 
if (total_pfn > HA_CHUNK) {
@@ -671,11 +675,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 
has->covered_end_pfn +=  processed_pfn;
+   spin_unlock_irqrestore(_device.ha_lock, flags);
 
init_completion(_device.ol_waitevent);
dm_device.ha_waiting = !memhp_auto_online;
 
-   mutex_unlock(_device.ha_region_mutex);
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
ret = add_memory(nid, PFN_PHYS((start_pfn)),
(HA_CHUNK << PAGE_SHIFT));
@@ -692,9 +696,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
 */
do_hot_add = false;
}
+   spin_lock_irqsave(_device.ha_lock, flags);
has->ha_end_pfn -= HA_CHUNK;
has->covered_end_pfn -=  processed_pfn;
-   mutex_lock(_device.ha_region_mutex);
+   spin_unlock_irqrestore(_device.ha_lock, flags);
break;
}
 
@@ -708,7 +713,6 @@ static void

[PATCH 5/5] Drivers: hv: balloon: Use available memory value in pressure report

2016-08-24 Thread kys

From: Alex Ng 

Reports for available memory should use the si_mem_available() value.
The previous freeram value does not include available page cache memory.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   13 +
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index d55e0e7..fdf8da9 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1075,7 +1075,6 @@ static unsigned long compute_balloon_floor(void)
 static void post_status(struct hv_dynmem_device *dm)
 {
struct dm_status status;
-   struct sysinfo val;
unsigned long now = jiffies;
unsigned long last_post = last_post_time;
 
@@ -1087,7 +1086,6 @@ static void post_status(struct hv_dynmem_device *dm)
if (!time_after(now, (last_post_time + HZ)))
return;
 
-   si_meminfo();
memset(, 0, sizeof(struct dm_status));
status.hdr.type = DM_STATUS_REPORT;
status.hdr.size = sizeof(struct dm_status);
@@ -1103,7 +1101,7 @@ static void post_status(struct hv_dynmem_device *dm)
 * num_pages_onlined) as committed to the host, otherwise it can try
 * asking us to balloon them out.
 */
-   status.num_avail = val.freeram;
+   status.num_avail = si_mem_available();
status.num_committed = vm_memory_committed() +
dm->num_pages_ballooned +
(dm->num_pages_added > dm->num_pages_onlined ?
@@ -1209,7 +1207,7 @@ static void balloon_up(struct work_struct *dummy)
int ret;
bool done = false;
int i;
-   struct sysinfo val;
+   long avail_pages;
unsigned long floor;
 
/* The host balloons pages in 2M granularity. */
@@ -1221,12 +1219,12 @@ static void balloon_up(struct work_struct *dummy)
 */
alloc_unit = 512;
 
-   si_meminfo();
+   avail_pages = si_mem_available();
floor = compute_balloon_floor();
 
/* Refuse to balloon below the floor, keep the 2M granularity. */
-   if (val.freeram < num_pages || val.freeram - num_pages < floor) {
-   num_pages = val.freeram > floor ? (val.freeram - floor) : 0;
+   if (avail_pages < num_pages || avail_pages - num_pages < floor) {
+   num_pages = avail_pages > floor ? (avail_pages - floor) : 0;
num_pages -= num_pages % PAGES_IN_2M;
}
 
@@ -1237,7 +1235,6 @@ static void balloon_up(struct work_struct *dummy)
bl_resp->hdr.size = sizeof(struct dm_balloon_response);
bl_resp->more_pages = 1;
 
-
num_pages -= num_ballooned;
num_ballooned = alloc_balloon_pages(_device, num_pages,
bl_resp, alloc_unit);
-- 
1.7.4.1

[PATCH 2/5] Drivers: hv: balloon: account for gaps in hot add regions

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x16 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x19 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x19 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |  131 +-
 1 files changed, 94 insertions(+), 37 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4ae26d6..18766f6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -441,6 +441,16 @@ struct hv_hotadd_state {
unsigned long covered_end_pfn;
unsigned long ha_end_pfn;
unsigned long end_pfn;
+   /*
+* A list of gaps.
+*/
+   struct list_head gap_list;
+};
+
+struct hv_hotadd_gap {
+   struct list_head list;
+   unsigned long start_pfn;
+   unsigned long end_pfn;
 };
 
 struct balloon_state {
@@ -596,18 +606,46 @@ static struct notifier_block hv_memory_nb = {
.priority = 0
 };
 
+/* Check if the particular page is backed and can be onlined and online it. */
+static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg)
+{
+   unsigned long cur_start_pgp;
+   unsigned long cur_end_pgp;
+   struct hv_hotadd_gap *gap;
+
+   cur_start_pgp = (unsigned long)pfn_to_page(has->covered_start_pfn);
+   cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
-static void hv_bring_pgs_online(unsigned long start_pfn, unsigned long size)
+   /* The page is not backed. */
+   if (((unsigned long)pg < cur_start_pgp) ||
+   ((unsigned long)pg >= cur_end_pgp))
+   return;
+
+   /* Check for gaps. */
+   list_for_each_entry(gap, >gap_list, list) {
+   cur_start_pgp = (unsigned long)
+   pfn_to_page(gap->start_pfn);
+   cur_end_pgp = (unsigned long)
+   pfn_to_page(gap->end_pfn);
+   if (((unsigned long)pg >= cur_start_pgp) &&
+   ((unsigned long)pg < cur_end_pgp)) {
+   return;
+   }
+   }
+
+   /* This frame is currently backed; online the page. */
+   __online_page_set_limits(pg);
+   __online_page_increment_counters(pg);
+   __online_page_free(pg);
+}
+
+static void hv_bring_pgs_online(struct hv_hotadd_state *has,
+   unsigned long start_pfn, unsigned long size)
 {
int i;
 
-   for (i = 0; i < size; i++) {
-   struct page *pg;
-   pg = pfn_to_page(start_pfn + i);
-   __online_page_set_limits(pg);
-   __online_page_increment_counters(pg);
-   __online_page_free(pg);
-   }
+   for (i = 0; i < size; i++)
+   hv_page_online_one(has, pfn_to_page(start_pfn + i));
 }
 
 static void hv_mem_hot_add(unsigned long start, unsigned long size,
@@ -684,26 +722,24 @@ static void hv_online_page(struct page *pg)
list_for_each(cur, _device.ha_region_list) {
has = list_entry(cur, struct hv_hotadd_state, list);
cur_start_pgp = (unsigned long)
-   pfn_to_page(has->covered_start_pfn);
-   cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
+   pfn_to_page(has->start_pfn);
+   cur_end_pgp = (unsigned long)pfn_to_page(has->end_pfn);
 
-   if (((unsigned long)pg >= cur_start_pgp) &&
-   ((unsigned long)pg < cur_end_pgp)) {
-   /*
-* This frame is currently backed; online the
-* page.
-*/
-   __online_page_set_limits(pg);
-   __online_page_increment_counters(pg);
-

[PATCH 4/5] Drivers: hv: balloon: replace ha_region_mutex with spinlock

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

lockdep reports possible circular locking dependency when udev is used
for memory onlining:

 systemd-udevd/3996 is trying to acquire lock:
  ((memory_chain).rwsem){.+}, at: [] 
__blocking_notifier_call_chain+0x4e/0xc0

 but task is already holding lock:
  (_device.ha_region_mutex){+.+.+.}, at: [] 
hv_memory_notifier+0x5e/0xc0 [hv_balloon]
 ...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   98 +-
 1 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3441326..d55e0e7 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,7 +547,11 @@ struct hv_dynmem_device {
 */
struct task_struct *thread;
 
-   struct mutex ha_region_mutex;
+   /*
+* Protects ha_region_list, num_pages_onlined counter and individual
+* regions from ha_region_list.
+*/
+   spinlock_t ha_lock;
 
/*
 * A list of hot-add regions.
@@ -571,18 +575,14 @@ static int hv_memory_notifier(struct notifier_block *nb, 
unsigned long val,
  void *v)
 {
struct memory_notify *mem = (struct memory_notify *)v;
+   unsigned long flags;
 
switch (val) {
-   case MEM_GOING_ONLINE:
-   mutex_lock(_device.ha_region_mutex);
-   break;
-
case MEM_ONLINE:
+   spin_lock_irqsave(_device.ha_lock, flags);
dm_device.num_pages_onlined += mem->nr_pages;
+   spin_unlock_irqrestore(_device.ha_lock, flags);
case MEM_CANCEL_ONLINE:
-   if (val == MEM_ONLINE ||
-   mutex_is_locked(_device.ha_region_mutex))
-   mutex_unlock(_device.ha_region_mutex);
if (dm_device.ha_waiting) {
dm_device.ha_waiting = false;
complete(_device.ol_waitevent);
@@ -590,10 +590,11 @@ static int hv_memory_notifier(struct notifier_block *nb, 
unsigned long val,
break;
 
case MEM_OFFLINE:
-   mutex_lock(_device.ha_region_mutex);
+   spin_lock_irqsave(_device.ha_lock, flags);
dm_device.num_pages_onlined -= mem->nr_pages;
-   mutex_unlock(_device.ha_region_mutex);
+   spin_unlock_irqrestore(_device.ha_lock, flags);
break;
+   case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
case MEM_CANCEL_OFFLINE:
break;
@@ -657,9 +658,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
unsigned long start_pfn;
unsigned long processed_pfn;
unsigned long total_pfn = pfn_count;
+   unsigned long flags;
 
for (i = 0; i < (size/HA_CHUNK); i++) {
start_pfn = start + (i * HA_CHUNK);
+
+   spin_lock_irqsave(_device.ha_lock, flags);
has->ha_end_pfn +=  HA_CHUNK;
 
if (total_pfn > HA_CHUNK) {
@@ -671,11 +675,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 
has->covered_end_pfn +=  processed_pfn;
+   spin_unlock_irqrestore(_device.ha_lock, flags);
 
init_completion(_device.ol_waitevent);
dm_device.ha_waiting = !memhp_auto_online;
 
-   mutex_unlock(_device.ha_region_mutex);
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
ret = add_memory(nid, PFN_PHYS((start_pfn)),
(HA_CHUNK << PAGE_SHIFT));
@@ -692,9 +696,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
 */
do_hot_add = false;
}
+   spin_lock_irqsave(_device.ha_lock, flags);
has->ha_end_pfn -= HA_CHUNK;
has->covered_end_pfn -=  processed_pfn;
-   mutex_lock(_device.ha_region_mutex);
+   spin_unlock_irqrestore(_device.ha_lock, flags);
break;
}
 
@@ -708,7 +713,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
if

[PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 18766f6..3441326 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -673,7 +673,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
has->covered_end_pfn +=  processed_pfn;
 
init_completion(_device.ol_waitevent);
-   dm_device.ha_waiting = true;
+   dm_device.ha_waiting = !memhp_auto_online;
 
mutex_unlock(_device.ha_region_mutex);
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
@@ -699,12 +699,15 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 
/*
-* Wait for the memory block to be onlined.
-* Since the hot add has succeeded, it is ok to
-* proceed even if the pages in the hot added region
-* have not been "onlined" within the allowed time.
+* Wait for the memory block to be onlined when memory onlining
+* is done outside of kernel (memhp_auto_online). Since the hot
+* add has succeeded, it is ok to proceed even if the pages in
+* the hot added region have not been "onlined" within the
+* allowed time.
 */
-   wait_for_completion_timeout(_device.ol_waitevent, 5*HZ);
+   if (dm_device.ha_waiting)
+   wait_for_completion_timeout(_device.ol_waitevent,
+   5*HZ);
mutex_lock(_device.ha_region_mutex);
post_status(_device);
}
-- 
1.7.4.1

[PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5a75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index df35fb7..4ae26d6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -430,13 +430,14 @@ struct dm_info_msg {
  * currently hot added. We hot add in multiples of 128M
  * chunks; it is possible that we may not be able to bring
  * online all the pages in the region. The range
- * covered_end_pfn defines the pages that can
+ * covered_start_pfn:covered_end_pfn defines the pages that can
  * be brough online.
  */
 
 struct hv_hotadd_state {
struct list_head list;
unsigned long start_pfn;
+   unsigned long covered_start_pfn;
unsigned long covered_end_pfn;
unsigned long ha_end_pfn;
unsigned long end_pfn;
@@ -682,7 +683,8 @@ static void hv_online_page(struct page *pg)
 
list_for_each(cur, _device.ha_region_list) {
has = list_entry(cur, struct hv_hotadd_state, list);
-   cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn);
+   cur_start_pgp = (unsigned long)
+   pfn_to_page(has->covered_start_pfn);
cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
if (((unsigned long)pg >= cur_start_pgp) &&
@@ -854,6 +856,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
list_add_tail(_region->list, _device.ha_region_list);
ha_region->start_pfn = rg_start;
ha_region->ha_end_pfn = rg_start;
+   ha_region->covered_start_pfn = pg_start;
ha_region->covered_end_pfn = pg_start;
ha_region->end_pfn = rg_start + rg_size;
}
-- 
1.7.4.1

[PATCH 3/5] Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 18766f6..3441326 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -673,7 +673,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
has->covered_end_pfn +=  processed_pfn;
 
init_completion(_device.ol_waitevent);
-   dm_device.ha_waiting = true;
+   dm_device.ha_waiting = !memhp_auto_online;
 
mutex_unlock(_device.ha_region_mutex);
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
@@ -699,12 +699,15 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
}
 
/*
-* Wait for the memory block to be onlined.
-* Since the hot add has succeeded, it is ok to
-* proceed even if the pages in the hot added region
-* have not been "onlined" within the allowed time.
+* Wait for the memory block to be onlined when memory onlining
+* is done outside of kernel (memhp_auto_online). Since the hot
+* add has succeeded, it is ok to proceed even if the pages in
+* the hot added region have not been "onlined" within the
+* allowed time.
 */
-   wait_for_completion_timeout(_device.ol_waitevent, 5*HZ);
+   if (dm_device.ha_waiting)
+   wait_for_completion_timeout(_device.ol_waitevent,
+   5*HZ);
mutex_lock(_device.ha_region_mutex);
post_status(_device);
}
-- 
1.7.4.1

[PATCH 1/5] Drivers: hv: balloon: keep track of where ha_region starts

2016-08-24 Thread kys

From: Vitaly Kuznetsov 

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5a75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_balloon.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index df35fb7..4ae26d6 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -430,13 +430,14 @@ struct dm_info_msg {
  * currently hot added. We hot add in multiples of 128M
  * chunks; it is possible that we may not be able to bring
  * online all the pages in the region. The range
- * covered_end_pfn defines the pages that can
+ * covered_start_pfn:covered_end_pfn defines the pages that can
  * be brough online.
  */
 
 struct hv_hotadd_state {
struct list_head list;
unsigned long start_pfn;
+   unsigned long covered_start_pfn;
unsigned long covered_end_pfn;
unsigned long ha_end_pfn;
unsigned long end_pfn;
@@ -682,7 +683,8 @@ static void hv_online_page(struct page *pg)
 
list_for_each(cur, _device.ha_region_list) {
has = list_entry(cur, struct hv_hotadd_state, list);
-   cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn);
+   cur_start_pgp = (unsigned long)
+   pfn_to_page(has->covered_start_pfn);
cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
if (((unsigned long)pg >= cur_start_pgp) &&
@@ -854,6 +856,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
list_add_tail(_region->list, _device.ha_region_list);
ha_region->start_pfn = rg_start;
ha_region->ha_end_pfn = rg_start;
+   ha_region->covered_start_pfn = pg_start;
ha_region->covered_end_pfn = pg_start;
ha_region->end_pfn = rg_start + rg_size;
}
-- 
1.7.4.1

[PATCH 0/5] Drivers: hv: balloon: Miscellaneous fixes.

2016-08-24 Thread kys

From: K. Y. Srinivasan 

Miscellaneous fixes to the balloon driver.

Alex Ng (1):
  Drivers: hv: balloon: Use available memory value in pressure report

Vitaly Kuznetsov (4):
  Drivers: hv: balloon: keep track of where ha_region starts
  Drivers: hv: balloon: account for gaps in hot add regions
  Drivers: hv: balloon: don't wait for ol_waitevent when
memhp_auto_online is enabled
  Drivers: hv: balloon: replace ha_region_mutex with spinlock

 drivers/hv/hv_balloon.c |  254 ++-
 1 files changed, 161 insertions(+), 93 deletions(-)

-- 
1.7.4.1

[PATCH 0/5] Drivers: hv: balloon: Miscellaneous fixes.

2016-08-24 Thread kys

From: K. Y. Srinivasan 

Miscellaneous fixes to the balloon driver.

Alex Ng (1):
  Drivers: hv: balloon: Use available memory value in pressure report

Vitaly Kuznetsov (4):
  Drivers: hv: balloon: keep track of where ha_region starts
  Drivers: hv: balloon: account for gaps in hot add regions
  Drivers: hv: balloon: don't wait for ol_waitevent when
memhp_auto_online is enabled
  Drivers: hv: balloon: replace ha_region_mutex with spinlock

 drivers/hv/hv_balloon.c |  254 ++-
 1 files changed, 161 insertions(+), 93 deletions(-)

-- 
1.7.4.1

[PATCH 2/2] Drivers: hv: utils: Check VSS daemon is listening before a hot backup

2016-08-18 Thread kys

From: Alex Ng 

Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is
ready for a live backup/snapshot. The driver should respond to the check
only if the daemon is running and listening to requests. This allows the
host to fallback to standard snapshots in case the VSS daemon is not
running.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_snapshot.c |9 ++---
 tools/hv/hv_vss_daemon.c |3 +++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 4c8dd20..4cfd854 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -136,6 +136,11 @@ static int vss_on_msg(void *msg, int len)
return vss_handle_handshake(vss_msg);
} else if (vss_transaction.state == HVUTIL_USERSPACE_REQ) {
vss_transaction.state = HVUTIL_USERSPACE_RECV;
+
+   if (vss_msg->vss_hdr.operation == VSS_OP_HOT_BACKUP)
+   vss_transaction.msg->vss_cf.flags =
+   VSS_HBU_NO_AUTO_RECOVERY;
+
if (cancel_delayed_work_sync(_timeout_work)) {
vss_respond_to_host(vss_msg->error);
/* Transaction is finished, reset the state. */
@@ -195,6 +200,7 @@ static void vss_handle_request(struct work_struct *dummy)
 */
case VSS_OP_THAW:
case VSS_OP_FREEZE:
+   case VSS_OP_HOT_BACKUP:
if (vss_transaction.state < HVUTIL_READY) {
/* Userspace is not registered yet */
vss_respond_to_host(HV_E_FAIL);
@@ -203,9 +209,6 @@ static void vss_handle_request(struct work_struct *dummy)
vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
vss_send_op();
return;
-   case VSS_OP_HOT_BACKUP:
-   vss_transaction.msg->vss_cf.flags = VSS_HBU_NO_AUTO_RECOVERY;
-   break;
case VSS_OP_GET_DM_INFO:
vss_transaction.msg->dm_info.flags = 0;
break;
diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
index 5d51d6f..e082980 100644
--- a/tools/hv/hv_vss_daemon.c
+++ b/tools/hv/hv_vss_daemon.c
@@ -250,6 +250,9 @@ int main(int argc, char *argv[])
syslog(LOG_ERR, "/etc/fstab and /proc/mounts");
}
break;
+   case VSS_OP_HOT_BACKUP:
+   syslog(LOG_INFO, "VSS: op=CHECK HOT BACKUP\n");
+   break;
default:
syslog(LOG_ERR, "Illegal op:%d\n", op);
}
-- 
1.7.4.1

[PATCH 2/2] Drivers: hv: utils: Check VSS daemon is listening before a hot backup

2016-08-18 Thread kys

From: Alex Ng 

Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is
ready for a live backup/snapshot. The driver should respond to the check
only if the daemon is running and listening to requests. This allows the
host to fallback to standard snapshots in case the VSS daemon is not
running.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_snapshot.c |9 ++---
 tools/hv/hv_vss_daemon.c |3 +++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 4c8dd20..4cfd854 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -136,6 +136,11 @@ static int vss_on_msg(void *msg, int len)
return vss_handle_handshake(vss_msg);
} else if (vss_transaction.state == HVUTIL_USERSPACE_REQ) {
vss_transaction.state = HVUTIL_USERSPACE_RECV;
+
+   if (vss_msg->vss_hdr.operation == VSS_OP_HOT_BACKUP)
+   vss_transaction.msg->vss_cf.flags =
+   VSS_HBU_NO_AUTO_RECOVERY;
+
if (cancel_delayed_work_sync(_timeout_work)) {
vss_respond_to_host(vss_msg->error);
/* Transaction is finished, reset the state. */
@@ -195,6 +200,7 @@ static void vss_handle_request(struct work_struct *dummy)
 */
case VSS_OP_THAW:
case VSS_OP_FREEZE:
+   case VSS_OP_HOT_BACKUP:
if (vss_transaction.state < HVUTIL_READY) {
/* Userspace is not registered yet */
vss_respond_to_host(HV_E_FAIL);
@@ -203,9 +209,6 @@ static void vss_handle_request(struct work_struct *dummy)
vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
vss_send_op();
return;
-   case VSS_OP_HOT_BACKUP:
-   vss_transaction.msg->vss_cf.flags = VSS_HBU_NO_AUTO_RECOVERY;
-   break;
case VSS_OP_GET_DM_INFO:
vss_transaction.msg->dm_info.flags = 0;
break;
diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
index 5d51d6f..e082980 100644
--- a/tools/hv/hv_vss_daemon.c
+++ b/tools/hv/hv_vss_daemon.c
@@ -250,6 +250,9 @@ int main(int argc, char *argv[])
syslog(LOG_ERR, "/etc/fstab and /proc/mounts");
}
break;
+   case VSS_OP_HOT_BACKUP:
+   syslog(LOG_INFO, "VSS: op=CHECK HOT BACKUP\n");
+   break;
default:
syslog(LOG_ERR, "Illegal op:%d\n", op);
}
-- 
1.7.4.1

[PATCH 1/2] Drivers: hv: utils: Continue to poll VSS channel after handling requests.

2016-08-18 Thread kys

From: Alex Ng 

Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even
though the host only signals once. The driver wass handling the first
request while ignoring the others in the ring buffer. We should poll the
VSS channel after handling a request to continue processing other requests.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_snapshot.c |   89 +
 1 files changed, 42 insertions(+), 47 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 3fba14e..4c8dd20 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -67,11 +67,11 @@ static const char vss_devname[] = "vmbus/hv_vss";
 static __u8 *recv_buffer;
 static struct hvutil_transport *hvt;
 
-static void vss_send_op(struct work_struct *dummy);
 static void vss_timeout_func(struct work_struct *dummy);
+static void vss_handle_request(struct work_struct *dummy);
 
 static DECLARE_DELAYED_WORK(vss_timeout_work, vss_timeout_func);
-static DECLARE_WORK(vss_send_op_work, vss_send_op);
+static DECLARE_WORK(vss_handle_request_work, vss_handle_request);
 
 static void vss_poll_wrapper(void *channel)
 {
@@ -150,8 +150,7 @@ static int vss_on_msg(void *msg, int len)
return 0;
 }
 
-
-static void vss_send_op(struct work_struct *dummy)
+static void vss_send_op(void)
 {
int op = vss_transaction.msg->vss_hdr.operation;
int rc;
@@ -168,6 +167,8 @@ static void vss_send_op(struct work_struct *dummy)
vss_msg->vss_hdr.operation = op;
 
vss_transaction.state = HVUTIL_USERSPACE_REQ;
+
+   schedule_delayed_work(_timeout_work, VSS_USERSPACE_TIMEOUT);
rc = hvutil_transport_send(hvt, vss_msg, sizeof(*vss_msg));
if (rc) {
pr_warn("VSS: failed to communicate to the daemon: %d\n", rc);
@@ -182,6 +183,40 @@ static void vss_send_op(struct work_struct *dummy)
return;
 }
 
+static void vss_handle_request(struct work_struct *dummy)
+{
+   switch (vss_transaction.msg->vss_hdr.operation) {
+   /*
+* Initiate a "freeze/thaw" operation in the guest.
+* We respond to the host once the operation is complete.
+*
+* We send the message to the user space daemon and the operation is
+* performed in the daemon.
+*/
+   case VSS_OP_THAW:
+   case VSS_OP_FREEZE:
+   if (vss_transaction.state < HVUTIL_READY) {
+   /* Userspace is not registered yet */
+   vss_respond_to_host(HV_E_FAIL);
+   return;
+   }
+   vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
+   vss_send_op();
+   return;
+   case VSS_OP_HOT_BACKUP:
+   vss_transaction.msg->vss_cf.flags = VSS_HBU_NO_AUTO_RECOVERY;
+   break;
+   case VSS_OP_GET_DM_INFO:
+   vss_transaction.msg->dm_info.flags = 0;
+   break;
+   default:
+   break;
+   }
+
+   vss_respond_to_host(0);
+   hv_poll_channel(vss_transaction.recv_channel, vss_poll_wrapper);
+}
+
 /*
  * Send a response back to the host.
  */
@@ -266,48 +301,8 @@ void hv_vss_onchannelcallback(void *context)
vss_transaction.recv_req_id = requestid;
vss_transaction.msg = (struct hv_vss_msg *)vss_msg;
 
-   switch (vss_msg->vss_hdr.operation) {
-   /*
-* Initiate a "freeze/thaw"
-* operation in the guest.
-* We respond to the host once
-* the operation is complete.
-*
-* We send the message to the
-* user space daemon and the
-* operation is performed in
-* the daemon.
-*/
-   case VSS_OP_FREEZE:
-   case VSS_OP_THAW:
-   if (vss_transaction.state < HVUTIL_READY) {
-   /* Userspace is not registered yet */
-   vss_respond_to_host(HV_E_FAIL);
-   return;
-   }
-   vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
-   schedule_work(_send_op_work);
-   schedule_delayed_work(_timeout_work,
- VSS_USERSPACE_TIMEOUT);
-   return;
-
-   case VSS_OP_HOT_BACKUP:
-   vss_msg->vss_cf.flags =
-

[PATCH 1/2] Drivers: hv: utils: Continue to poll VSS channel after handling requests.

2016-08-18 Thread kys

From: Alex Ng 

Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even
though the host only signals once. The driver wass handling the first
request while ignoring the others in the ring buffer. We should poll the
VSS channel after handling a request to continue processing other requests.

Signed-off-by: Alex Ng 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_snapshot.c |   89 +
 1 files changed, 42 insertions(+), 47 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 3fba14e..4c8dd20 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -67,11 +67,11 @@ static const char vss_devname[] = "vmbus/hv_vss";
 static __u8 *recv_buffer;
 static struct hvutil_transport *hvt;
 
-static void vss_send_op(struct work_struct *dummy);
 static void vss_timeout_func(struct work_struct *dummy);
+static void vss_handle_request(struct work_struct *dummy);
 
 static DECLARE_DELAYED_WORK(vss_timeout_work, vss_timeout_func);
-static DECLARE_WORK(vss_send_op_work, vss_send_op);
+static DECLARE_WORK(vss_handle_request_work, vss_handle_request);
 
 static void vss_poll_wrapper(void *channel)
 {
@@ -150,8 +150,7 @@ static int vss_on_msg(void *msg, int len)
return 0;
 }
 
-
-static void vss_send_op(struct work_struct *dummy)
+static void vss_send_op(void)
 {
int op = vss_transaction.msg->vss_hdr.operation;
int rc;
@@ -168,6 +167,8 @@ static void vss_send_op(struct work_struct *dummy)
vss_msg->vss_hdr.operation = op;
 
vss_transaction.state = HVUTIL_USERSPACE_REQ;
+
+   schedule_delayed_work(_timeout_work, VSS_USERSPACE_TIMEOUT);
rc = hvutil_transport_send(hvt, vss_msg, sizeof(*vss_msg));
if (rc) {
pr_warn("VSS: failed to communicate to the daemon: %d\n", rc);
@@ -182,6 +183,40 @@ static void vss_send_op(struct work_struct *dummy)
return;
 }
 
+static void vss_handle_request(struct work_struct *dummy)
+{
+   switch (vss_transaction.msg->vss_hdr.operation) {
+   /*
+* Initiate a "freeze/thaw" operation in the guest.
+* We respond to the host once the operation is complete.
+*
+* We send the message to the user space daemon and the operation is
+* performed in the daemon.
+*/
+   case VSS_OP_THAW:
+   case VSS_OP_FREEZE:
+   if (vss_transaction.state < HVUTIL_READY) {
+   /* Userspace is not registered yet */
+   vss_respond_to_host(HV_E_FAIL);
+   return;
+   }
+   vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
+   vss_send_op();
+   return;
+   case VSS_OP_HOT_BACKUP:
+   vss_transaction.msg->vss_cf.flags = VSS_HBU_NO_AUTO_RECOVERY;
+   break;
+   case VSS_OP_GET_DM_INFO:
+   vss_transaction.msg->dm_info.flags = 0;
+   break;
+   default:
+   break;
+   }
+
+   vss_respond_to_host(0);
+   hv_poll_channel(vss_transaction.recv_channel, vss_poll_wrapper);
+}
+
 /*
  * Send a response back to the host.
  */
@@ -266,48 +301,8 @@ void hv_vss_onchannelcallback(void *context)
vss_transaction.recv_req_id = requestid;
vss_transaction.msg = (struct hv_vss_msg *)vss_msg;
 
-   switch (vss_msg->vss_hdr.operation) {
-   /*
-* Initiate a "freeze/thaw"
-* operation in the guest.
-* We respond to the host once
-* the operation is complete.
-*
-* We send the message to the
-* user space daemon and the
-* operation is performed in
-* the daemon.
-*/
-   case VSS_OP_FREEZE:
-   case VSS_OP_THAW:
-   if (vss_transaction.state < HVUTIL_READY) {
-   /* Userspace is not registered yet */
-   vss_respond_to_host(HV_E_FAIL);
-   return;
-   }
-   vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
-   schedule_work(_send_op_work);
-   schedule_delayed_work(_timeout_work,
- VSS_USERSPACE_TIMEOUT);
-   return;
-
-   case VSS_OP_HOT_BACKUP:
-   vss_msg->vss_cf.flags =
-VSS_HBU_NO_AUTO_RECOVERY;
-

[PATCH 0/2] Drivers: hv: util: Some fixes to the backup driver

2016-08-18 Thread kys

From: K. Y. Srinivasan 

Some fixes to the backup driver.

Alex Ng (2):
  Drivers: hv: utils: Continue to poll VSS channel after handling
requests.
  Drivers: hv: utils: Check VSS daemon is listening before a hot backup

 drivers/hv/hv_snapshot.c |   92 ++---
 tools/hv/hv_vss_daemon.c |3 +
 2 files changed, 48 insertions(+), 47 deletions(-)

-- 
1.7.4.1

[PATCH 0/2] Drivers: hv: util: Some fixes to the backup driver

2016-08-18 Thread kys

From: K. Y. Srinivasan 

Some fixes to the backup driver.

Alex Ng (2):
  Drivers: hv: utils: Continue to poll VSS channel after handling
requests.
  Drivers: hv: utils: Check VSS daemon is listening before a hot backup

 drivers/hv/hv_snapshot.c |   92 ++---
 tools/hv/hv_vss_daemon.c |3 +
 2 files changed, 48 insertions(+), 47 deletions(-)

-- 
1.7.4.1

[PATCH 1/1] Tools: hv: kvp: ensure kvp device fd is closed on exec

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

KVP daemon does fork()/exec() (with popen()) so we need to close our fds
to avoid sharing them with child processes. The immediate implication of
not doing so I see is SELinux complaining about 'ip' trying to access
'/dev/vmbus/hv_kvp'.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 tools/hv/hv_kvp_daemon.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 0d9f48e..bc7adb8 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1433,7 +1433,7 @@ int main(int argc, char *argv[])
openlog("KVP", 0, LOG_USER);
syslog(LOG_INFO, "KVP starting; pid is:%d", getpid());
 
-   kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR);
+   kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR | O_CLOEXEC);
 
if (kvp_fd < 0) {
syslog(LOG_ERR, "open /dev/vmbus/hv_kvp failed; error: %d %s",
-- 
1.7.4.1

[PATCH 1/1] Tools: hv: kvp: ensure kvp device fd is closed on exec

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

KVP daemon does fork()/exec() (with popen()) so we need to close our fds
to avoid sharing them with child processes. The immediate implication of
not doing so I see is SELinux complaining about 'ip' trying to access
'/dev/vmbus/hv_kvp'.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 tools/hv/hv_kvp_daemon.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 0d9f48e..bc7adb8 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1433,7 +1433,7 @@ int main(int argc, char *argv[])
openlog("KVP", 0, LOG_USER);
syslog(LOG_INFO, "KVP starting; pid is:%d", getpid());
 
-   kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR);
+   kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR | O_CLOEXEC);
 
if (kvp_fd < 0) {
syslog(LOG_ERR, "open /dev/vmbus/hv_kvp failed; error: %d %s",
-- 
1.7.4.1

[PATCH 1/1] Drivers: hv: Introduce a policy for controlling channel affinity

2016-07-06 Thread kys

From: K. Y. Srinivasan 

Introduce a mechanism to control how channels will be affinitized. We will
support two policies:

1. HV_BALANCED: All performance critical channels will be dstributed
evenly amongst all the available NUMA nodes. Once the Node is assigned,
we will assign the CPU based on a simple round robin scheme.

2. HV_LOCALIZED: Only the primary channels are distributed across all
NUMA nodes. Sub-channels will be in the same NUMA node as the primary
channel. This is the current behaviour.

The default policy will be the HV_BALANCED as it can minimize the remote
memory access on NUMA machines with applications that span NUMA nodes.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel_mgmt.c |   68 +---
 include/linux/hyperv.h|   23 +++
 2 files changed, 62 insertions(+), 29 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8345869..aaa2c4b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -338,8 +338,9 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
 * We need to free the bit for init_vp_index() to work in the case
 * of sub-channel, when we reload drivers like hv_netvsc.
 */
-   cpumask_clear_cpu(channel->target_cpu,
- _channel->alloced_cpus_in_node);
+   if (channel->affinity_policy == HV_LOCALIZED)
+   cpumask_clear_cpu(channel->target_cpu,
+ _channel->alloced_cpus_in_node);
 
free_channel(channel);
 }
@@ -524,17 +525,17 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
}
 
/*
-* We distribute primary channels evenly across all the available
-* NUMA nodes and within the assigned NUMA node we will assign the
-* first available CPU to the primary channel.
-* The sub-channels will be assigned to the CPUs available in the
-* NUMA node evenly.
+* Based on the channel affinity policy, we will assign the NUMA
+* nodes.
 */
-   if (!primary) {
+
+   if ((channel->affinity_policy == HV_BALANCED) || (!primary)) {
while (true) {
next_node = next_numa_node_id++;
-   if (next_node == nr_node_ids)
+   if (next_node == nr_node_ids) {
next_node = next_numa_node_id = 0;
+   continue;
+   }
if (cpumask_empty(cpumask_of_node(next_node)))
continue;
break;
@@ -558,15 +559,17 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
 
cur_cpu = -1;
 
-   /*
-* Normally Hyper-V host doesn't create more subchannels than there
-* are VCPUs on the node but it is possible when not all present VCPUs
-* on the node are initialized by guest. Clear the alloced_cpus_in_node
-* to start over.
-*/
-   if (cpumask_equal(>alloced_cpus_in_node,
- cpumask_of_node(primary->numa_node)))
-   cpumask_clear(>alloced_cpus_in_node);
+   if (primary->affinity_policy == HV_LOCALIZED) {
+   /*
+* Normally Hyper-V host doesn't create more subchannels
+* than there are VCPUs on the node but it is possible when not
+* all present VCPUs on the node are initialized by guest.
+* Clear the alloced_cpus_in_node to start over.
+*/
+   if (cpumask_equal(>alloced_cpus_in_node,
+ cpumask_of_node(primary->numa_node)))
+   cpumask_clear(>alloced_cpus_in_node);
+   }
 
while (true) {
cur_cpu = cpumask_next(cur_cpu, _mask);
@@ -577,17 +580,24 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
continue;
}
 
-   /*
-* NOTE: in the case of sub-channel, we clear the sub-channel
-* related bit(s) in primary->alloced_cpus_in_node in
-* hv_process_channel_removal(), so when we reload drivers
-* like hv_netvsc in SMP guest, here we're able to re-allocate
-* bit from primary->alloced_cpus_in_node.
-*/
-   if (!cpumask_test_cpu(cur_cpu,
-   >alloced_cpus_in_node)) {
-   cpumask_set_cpu(cur_cpu,
-   >alloced_cpus_in_node);
+   if (primary->affinity_policy == HV_LOCALIZED) {
+   /*
+* NOTE: in the case of sub-channel, we clear the
+

[PATCH 1/1] Drivers: hv: Introduce a policy for controlling channel affinity

2016-07-06 Thread kys

From: K. Y. Srinivasan 

Introduce a mechanism to control how channels will be affinitized. We will
support two policies:

1. HV_BALANCED: All performance critical channels will be dstributed
evenly amongst all the available NUMA nodes. Once the Node is assigned,
we will assign the CPU based on a simple round robin scheme.

2. HV_LOCALIZED: Only the primary channels are distributed across all
NUMA nodes. Sub-channels will be in the same NUMA node as the primary
channel. This is the current behaviour.

The default policy will be the HV_BALANCED as it can minimize the remote
memory access on NUMA machines with applications that span NUMA nodes.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel_mgmt.c |   68 +---
 include/linux/hyperv.h|   23 +++
 2 files changed, 62 insertions(+), 29 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8345869..aaa2c4b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -338,8 +338,9 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
 * We need to free the bit for init_vp_index() to work in the case
 * of sub-channel, when we reload drivers like hv_netvsc.
 */
-   cpumask_clear_cpu(channel->target_cpu,
- _channel->alloced_cpus_in_node);
+   if (channel->affinity_policy == HV_LOCALIZED)
+   cpumask_clear_cpu(channel->target_cpu,
+ _channel->alloced_cpus_in_node);
 
free_channel(channel);
 }
@@ -524,17 +525,17 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
}
 
/*
-* We distribute primary channels evenly across all the available
-* NUMA nodes and within the assigned NUMA node we will assign the
-* first available CPU to the primary channel.
-* The sub-channels will be assigned to the CPUs available in the
-* NUMA node evenly.
+* Based on the channel affinity policy, we will assign the NUMA
+* nodes.
 */
-   if (!primary) {
+
+   if ((channel->affinity_policy == HV_BALANCED) || (!primary)) {
while (true) {
next_node = next_numa_node_id++;
-   if (next_node == nr_node_ids)
+   if (next_node == nr_node_ids) {
next_node = next_numa_node_id = 0;
+   continue;
+   }
if (cpumask_empty(cpumask_of_node(next_node)))
continue;
break;
@@ -558,15 +559,17 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
 
cur_cpu = -1;
 
-   /*
-* Normally Hyper-V host doesn't create more subchannels than there
-* are VCPUs on the node but it is possible when not all present VCPUs
-* on the node are initialized by guest. Clear the alloced_cpus_in_node
-* to start over.
-*/
-   if (cpumask_equal(>alloced_cpus_in_node,
- cpumask_of_node(primary->numa_node)))
-   cpumask_clear(>alloced_cpus_in_node);
+   if (primary->affinity_policy == HV_LOCALIZED) {
+   /*
+* Normally Hyper-V host doesn't create more subchannels
+* than there are VCPUs on the node but it is possible when not
+* all present VCPUs on the node are initialized by guest.
+* Clear the alloced_cpus_in_node to start over.
+*/
+   if (cpumask_equal(>alloced_cpus_in_node,
+ cpumask_of_node(primary->numa_node)))
+   cpumask_clear(>alloced_cpus_in_node);
+   }
 
while (true) {
cur_cpu = cpumask_next(cur_cpu, _mask);
@@ -577,17 +580,24 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
continue;
}
 
-   /*
-* NOTE: in the case of sub-channel, we clear the sub-channel
-* related bit(s) in primary->alloced_cpus_in_node in
-* hv_process_channel_removal(), so when we reload drivers
-* like hv_netvsc in SMP guest, here we're able to re-allocate
-* bit from primary->alloced_cpus_in_node.
-*/
-   if (!cpumask_test_cpu(cur_cpu,
-   >alloced_cpus_in_node)) {
-   cpumask_set_cpu(cur_cpu,
-   >alloced_cpus_in_node);
+   if (primary->affinity_policy == HV_LOCALIZED) {
+   /*
+* NOTE: in the case of sub-channel, we clear the
+* sub-channel related bit(s) in
+*

[PATCH 1/4] Drivers: hv: cleanup vmbus_open() for wrap around mappings

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

In preparation for doing wrap around mappings for ring buffers cleanup
vmbus_open() function:
- check that ring sizes are PAGE_SIZE aligned (they are for all in-kernel
  drivers now);
- kfree(open_info) on error only after we kzalloc() it (not an issue as it
  is valid to call kfree(NULL);
- rename poorly named labels;
- use alloc_pages() instead of __get_free_pages() as we need struct page
  pointer for future.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/channel.c |   43 +++
 1 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index e3a0048..901b6ce 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -81,6 +81,10 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
unsigned long t;
struct page *page;
 
+   if (send_ringbuffer_size % PAGE_SIZE ||
+   recv_ringbuffer_size % PAGE_SIZE)
+   return -EINVAL;
+
spin_lock_irqsave(>lock, flags);
if (newchannel->state == CHANNEL_OPEN_STATE) {
newchannel->state = CHANNEL_OPENING_STATE;
@@ -100,17 +104,16 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
recv_ringbuffer_size));
 
if (!page)
-   out = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
-  get_order(send_ringbuffer_size +
-  recv_ringbuffer_size));
-   else
-   out = (void *)page_address(page);
+   page = alloc_pages(GFP_KERNEL|__GFP_ZERO,
+  get_order(send_ringbuffer_size +
+recv_ringbuffer_size));
 
-   if (!out) {
+   if (!page) {
err = -ENOMEM;
-   goto error0;
+   goto error_set_chnstate;
}
 
+   out = page_address(page);
in = (void *)((unsigned long)out + send_ringbuffer_size);
 
newchannel->ringbuffer_pages = out;
@@ -122,14 +125,14 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
ret = hv_ringbuffer_init(
>inbound, in, recv_ringbuffer_size);
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
 
@@ -144,7 +147,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
/* Create and init the channel open message */
@@ -153,7 +156,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
   GFP_KERNEL);
if (!open_info) {
err = -ENOMEM;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
init_completion(_info->waitevent);
@@ -169,7 +172,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (userdatalen > MAX_USER_DEFINED_BYTES) {
err = -EINVAL;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
if (userdatalen)
@@ -185,13 +188,13 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error1;
+   goto error_clean_msglist;
}
 
t = wait_for_completion_timeout(_info->waitevent, 5*HZ);
if (t == 0) {
err = -ETIMEDOUT;
-   goto error1;
+   goto error_clean_msglist;
}
 
spin_lock_irqsave(_connection.channelmsg_lock, flags);
@@ -200,25 +203,25 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (open_info->response.open_result.status) {
err = -EAGAIN;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
newchannel->state = CHANNEL_OPENED_STATE;
kfree(open_info);
return 0;
 
-error1:
+error_clean_msglist:
spin_lock_irqsave(_connection.channelmsg_lock, flags);
list_del(_info->msglistentry);
spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
 
-error_gpadl:
+error_free_gpadl:
vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
-
-error0:
+   kfree(open_info);
+error_free_pages:
free_pages((unsigned long)out,
get_order(send_ringbuffer_size + recv_ringbuffer_size));
-   kfree(open_info);

[PATCH 1/4] Drivers: hv: cleanup vmbus_open() for wrap around mappings

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

In preparation for doing wrap around mappings for ring buffers cleanup
vmbus_open() function:
- check that ring sizes are PAGE_SIZE aligned (they are for all in-kernel
  drivers now);
- kfree(open_info) on error only after we kzalloc() it (not an issue as it
  is valid to call kfree(NULL);
- rename poorly named labels;
- use alloc_pages() instead of __get_free_pages() as we need struct page
  pointer for future.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/channel.c |   43 +++
 1 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index e3a0048..901b6ce 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -81,6 +81,10 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
unsigned long t;
struct page *page;
 
+   if (send_ringbuffer_size % PAGE_SIZE ||
+   recv_ringbuffer_size % PAGE_SIZE)
+   return -EINVAL;
+
spin_lock_irqsave(>lock, flags);
if (newchannel->state == CHANNEL_OPEN_STATE) {
newchannel->state = CHANNEL_OPENING_STATE;
@@ -100,17 +104,16 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
recv_ringbuffer_size));
 
if (!page)
-   out = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
-  get_order(send_ringbuffer_size +
-  recv_ringbuffer_size));
-   else
-   out = (void *)page_address(page);
+   page = alloc_pages(GFP_KERNEL|__GFP_ZERO,
+  get_order(send_ringbuffer_size +
+recv_ringbuffer_size));
 
-   if (!out) {
+   if (!page) {
err = -ENOMEM;
-   goto error0;
+   goto error_set_chnstate;
}
 
+   out = page_address(page);
in = (void *)((unsigned long)out + send_ringbuffer_size);
 
newchannel->ringbuffer_pages = out;
@@ -122,14 +125,14 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
ret = hv_ringbuffer_init(
>inbound, in, recv_ringbuffer_size);
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
 
@@ -144,7 +147,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error0;
+   goto error_free_pages;
}
 
/* Create and init the channel open message */
@@ -153,7 +156,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
   GFP_KERNEL);
if (!open_info) {
err = -ENOMEM;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
init_completion(_info->waitevent);
@@ -169,7 +172,7 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (userdatalen > MAX_USER_DEFINED_BYTES) {
err = -EINVAL;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
if (userdatalen)
@@ -185,13 +188,13 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (ret != 0) {
err = ret;
-   goto error1;
+   goto error_clean_msglist;
}
 
t = wait_for_completion_timeout(_info->waitevent, 5*HZ);
if (t == 0) {
err = -ETIMEDOUT;
-   goto error1;
+   goto error_clean_msglist;
}
 
spin_lock_irqsave(_connection.channelmsg_lock, flags);
@@ -200,25 +203,25 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 
if (open_info->response.open_result.status) {
err = -EAGAIN;
-   goto error_gpadl;
+   goto error_free_gpadl;
}
 
newchannel->state = CHANNEL_OPENED_STATE;
kfree(open_info);
return 0;
 
-error1:
+error_clean_msglist:
spin_lock_irqsave(_connection.channelmsg_lock, flags);
list_del(_info->msglistentry);
spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
 
-error_gpadl:
+error_free_gpadl:
vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
-
-error0:
+   kfree(open_info);
+error_free_pages:
free_pages((unsigned long)out,
get_order(send_ringbuffer_size + recv_ringbuffer_size));
-   kfree(open_info);
+error_set_chnstate:
newchannel->state = CHANNEL_OPEN_STATE;
return

[PATCH 2/4] Drivers: hv: ring_buffer: wrap around mappings for ring buffers

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

Make it possible to always use a single memcpy() or to provide a direct
link to a packet on the ring buffer by creating virtual mapping for two
copies of the ring buffer with vmap(). Utilize currently empty
hv_ringbuffer_cleanup() to do the unmap.

While on it, replace sizeof(struct hv_ring_buffer) check
in hv_ringbuffer_init() with BUILD_BUG_ON() as it is a compile time check.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/channel.c  |   29 ++---
 drivers/hv/hyperv_vmbus.h |4 ++--
 drivers/hv/ring_buffer.c  |   39 +--
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 901b6ce..aad26da 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -75,7 +75,6 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 {
struct vmbus_channel_open_channel *open_msg;
struct vmbus_channel_msginfo *open_info = NULL;
-   void *in, *out;
unsigned long flags;
int ret, err = 0;
unsigned long t;
@@ -113,23 +112,21 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
goto error_set_chnstate;
}
 
-   out = page_address(page);
-   in = (void *)((unsigned long)out + send_ringbuffer_size);
-
-   newchannel->ringbuffer_pages = out;
+   newchannel->ringbuffer_pages = page_address(page);
newchannel->ringbuffer_pagecount = (send_ringbuffer_size +
   recv_ringbuffer_size) >> PAGE_SHIFT;
 
-   ret = hv_ringbuffer_init(
-   >outbound, out, send_ringbuffer_size);
+   ret = hv_ringbuffer_init(>outbound, page,
+send_ringbuffer_size >> PAGE_SHIFT);
 
if (ret != 0) {
err = ret;
goto error_free_pages;
}
 
-   ret = hv_ringbuffer_init(
-   >inbound, in, recv_ringbuffer_size);
+   ret = hv_ringbuffer_init(>inbound,
+[send_ringbuffer_size >> PAGE_SHIFT],
+recv_ringbuffer_size >> PAGE_SHIFT);
if (ret != 0) {
err = ret;
goto error_free_pages;
@@ -140,10 +137,10 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
newchannel->ringbuffer_gpadlhandle = 0;
 
ret = vmbus_establish_gpadl(newchannel,
-newchannel->outbound.ring_buffer,
-send_ringbuffer_size +
-recv_ringbuffer_size,
->ringbuffer_gpadlhandle);
+   page_address(page),
+   send_ringbuffer_size +
+   recv_ringbuffer_size,
+   >ringbuffer_gpadlhandle);
 
if (ret != 0) {
err = ret;
@@ -219,8 +216,10 @@ error_free_gpadl:
vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
kfree(open_info);
 error_free_pages:
-   free_pages((unsigned long)out,
-   get_order(send_ringbuffer_size + recv_ringbuffer_size));
+   hv_ringbuffer_cleanup(>outbound);
+   hv_ringbuffer_cleanup(>inbound);
+   __free_pages(page,
+get_order(send_ringbuffer_size + recv_ringbuffer_size));
 error_set_chnstate:
newchannel->state = CHANNEL_OPEN_STATE;
return err;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index ddcc348..a5b4442 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -522,8 +522,8 @@ extern unsigned int host_info_edx;
 /* Interface */
 
 
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, void *buffer,
-  u32 buflen);
+int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 pagecnt);
 
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index e3edcae..7e21c2c 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -243,22 +245,46 @@ void hv_ringbuffer_get_debuginfo(struct 
hv_ring_buffer_info *ring_info,
 
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  void *buffer, u32 buflen)
+  struct page *pages, u32 page_cnt)
 {
-   if (sizeof(struct hv_ring_buffer) != PAGE_SIZE)
-   return -EINVAL;
+   int i;
+   struct page **pages_wraparound;
+
+

[PATCH 4/4] Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw()

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

With wrap around mappings in place we can always provide drivers with
direct links to packets on the ring buffer, even when they wrap around.
Do the required updates to get_next_pkt_raw()/put_pkt_raw()

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 include/linux/hyperv.h |   32 +++-
 1 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 362acf0..897e4a7 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1466,31 +1466,23 @@ static inline struct vmpacket_descriptor *
 get_next_pkt_raw(struct vmbus_channel *channel)
 {
struct hv_ring_buffer_info *ring_info = >inbound;
-   u32 read_loc = ring_info->priv_read_index;
+   u32 priv_read_loc = ring_info->priv_read_index;
void *ring_buffer = hv_get_ring_buffer(ring_info);
-   struct vmpacket_descriptor *cur_desc;
-   u32 packetlen;
u32 dsize = ring_info->ring_datasize;
-   u32 delta = read_loc - ring_info->ring_buffer->read_index;
+   /*
+* delta is the difference between what is available to read and
+* what was already consumed in place. We commit read index after
+* the whole batch is processed.
+*/
+   u32 delta = priv_read_loc >= ring_info->ring_buffer->read_index ?
+   priv_read_loc - ring_info->ring_buffer->read_index :
+   (dsize - ring_info->ring_buffer->read_index) + priv_read_loc;
u32 bytes_avail_toread = (hv_get_bytes_to_read(ring_info) - delta);
 
if (bytes_avail_toread < sizeof(struct vmpacket_descriptor))
return NULL;
 
-   if ((read_loc + sizeof(*cur_desc)) > dsize)
-   return NULL;
-
-   cur_desc = ring_buffer + read_loc;
-   packetlen = cur_desc->len8 << 3;
-
-   /*
-* If the packet under consideration is wrapping around,
-* return failure.
-*/
-   if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > (dsize - 1))
-   return NULL;
-
-   return cur_desc;
+   return ring_buffer + priv_read_loc;
 }
 
 /*
@@ -1502,16 +1494,14 @@ static inline void put_pkt_raw(struct vmbus_channel 
*channel,
struct vmpacket_descriptor *desc)
 {
struct hv_ring_buffer_info *ring_info = >inbound;
-   u32 read_loc = ring_info->priv_read_index;
u32 packetlen = desc->len8 << 3;
u32 dsize = ring_info->ring_datasize;
 
-   if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > dsize)
-   BUG();
/*
 * Include the packet trailer.
 */
ring_info->priv_read_index += packetlen + VMBUS_PKT_TRAILER;
+   ring_info->priv_read_index %= dsize;
 }
 
 /*
-- 
1.7.4.1

[PATCH 2/4] Drivers: hv: ring_buffer: wrap around mappings for ring buffers

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

Make it possible to always use a single memcpy() or to provide a direct
link to a packet on the ring buffer by creating virtual mapping for two
copies of the ring buffer with vmap(). Utilize currently empty
hv_ringbuffer_cleanup() to do the unmap.

While on it, replace sizeof(struct hv_ring_buffer) check
in hv_ringbuffer_init() with BUILD_BUG_ON() as it is a compile time check.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/channel.c  |   29 ++---
 drivers/hv/hyperv_vmbus.h |4 ++--
 drivers/hv/ring_buffer.c  |   39 +--
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 901b6ce..aad26da 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -75,7 +75,6 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
 {
struct vmbus_channel_open_channel *open_msg;
struct vmbus_channel_msginfo *open_info = NULL;
-   void *in, *out;
unsigned long flags;
int ret, err = 0;
unsigned long t;
@@ -113,23 +112,21 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
goto error_set_chnstate;
}
 
-   out = page_address(page);
-   in = (void *)((unsigned long)out + send_ringbuffer_size);
-
-   newchannel->ringbuffer_pages = out;
+   newchannel->ringbuffer_pages = page_address(page);
newchannel->ringbuffer_pagecount = (send_ringbuffer_size +
   recv_ringbuffer_size) >> PAGE_SHIFT;
 
-   ret = hv_ringbuffer_init(
-   >outbound, out, send_ringbuffer_size);
+   ret = hv_ringbuffer_init(>outbound, page,
+send_ringbuffer_size >> PAGE_SHIFT);
 
if (ret != 0) {
err = ret;
goto error_free_pages;
}
 
-   ret = hv_ringbuffer_init(
-   >inbound, in, recv_ringbuffer_size);
+   ret = hv_ringbuffer_init(>inbound,
+[send_ringbuffer_size >> PAGE_SHIFT],
+recv_ringbuffer_size >> PAGE_SHIFT);
if (ret != 0) {
err = ret;
goto error_free_pages;
@@ -140,10 +137,10 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 
send_ringbuffer_size,
newchannel->ringbuffer_gpadlhandle = 0;
 
ret = vmbus_establish_gpadl(newchannel,
-newchannel->outbound.ring_buffer,
-send_ringbuffer_size +
-recv_ringbuffer_size,
->ringbuffer_gpadlhandle);
+   page_address(page),
+   send_ringbuffer_size +
+   recv_ringbuffer_size,
+   >ringbuffer_gpadlhandle);
 
if (ret != 0) {
err = ret;
@@ -219,8 +216,10 @@ error_free_gpadl:
vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
kfree(open_info);
 error_free_pages:
-   free_pages((unsigned long)out,
-   get_order(send_ringbuffer_size + recv_ringbuffer_size));
+   hv_ringbuffer_cleanup(>outbound);
+   hv_ringbuffer_cleanup(>inbound);
+   __free_pages(page,
+get_order(send_ringbuffer_size + recv_ringbuffer_size));
 error_set_chnstate:
newchannel->state = CHANNEL_OPEN_STATE;
return err;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index ddcc348..a5b4442 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -522,8 +522,8 @@ extern unsigned int host_info_edx;
 /* Interface */
 
 
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, void *buffer,
-  u32 buflen);
+int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 pagecnt);
 
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index e3edcae..7e21c2c 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -243,22 +245,46 @@ void hv_ringbuffer_get_debuginfo(struct 
hv_ring_buffer_info *ring_info,
 
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  void *buffer, u32 buflen)
+  struct page *pages, u32 page_cnt)
 {
-   if (sizeof(struct hv_ring_buffer) != PAGE_SIZE)
-   return -EINVAL;
+   int i;
+   struct page **pages_wraparound;
+
+   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));

[PATCH 4/4] Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw()

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

With wrap around mappings in place we can always provide drivers with
direct links to packets on the ring buffer, even when they wrap around.
Do the required updates to get_next_pkt_raw()/put_pkt_raw()

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 include/linux/hyperv.h |   32 +++-
 1 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 362acf0..897e4a7 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1466,31 +1466,23 @@ static inline struct vmpacket_descriptor *
 get_next_pkt_raw(struct vmbus_channel *channel)
 {
struct hv_ring_buffer_info *ring_info = >inbound;
-   u32 read_loc = ring_info->priv_read_index;
+   u32 priv_read_loc = ring_info->priv_read_index;
void *ring_buffer = hv_get_ring_buffer(ring_info);
-   struct vmpacket_descriptor *cur_desc;
-   u32 packetlen;
u32 dsize = ring_info->ring_datasize;
-   u32 delta = read_loc - ring_info->ring_buffer->read_index;
+   /*
+* delta is the difference between what is available to read and
+* what was already consumed in place. We commit read index after
+* the whole batch is processed.
+*/
+   u32 delta = priv_read_loc >= ring_info->ring_buffer->read_index ?
+   priv_read_loc - ring_info->ring_buffer->read_index :
+   (dsize - ring_info->ring_buffer->read_index) + priv_read_loc;
u32 bytes_avail_toread = (hv_get_bytes_to_read(ring_info) - delta);
 
if (bytes_avail_toread < sizeof(struct vmpacket_descriptor))
return NULL;
 
-   if ((read_loc + sizeof(*cur_desc)) > dsize)
-   return NULL;
-
-   cur_desc = ring_buffer + read_loc;
-   packetlen = cur_desc->len8 << 3;
-
-   /*
-* If the packet under consideration is wrapping around,
-* return failure.
-*/
-   if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > (dsize - 1))
-   return NULL;
-
-   return cur_desc;
+   return ring_buffer + priv_read_loc;
 }
 
 /*
@@ -1502,16 +1494,14 @@ static inline void put_pkt_raw(struct vmbus_channel 
*channel,
struct vmpacket_descriptor *desc)
 {
struct hv_ring_buffer_info *ring_info = >inbound;
-   u32 read_loc = ring_info->priv_read_index;
u32 packetlen = desc->len8 << 3;
u32 dsize = ring_info->ring_datasize;
 
-   if ((read_loc + packetlen + VMBUS_PKT_TRAILER) > dsize)
-   BUG();
/*
 * Include the packet trailer.
 */
ring_info->priv_read_index += packetlen + VMBUS_PKT_TRAILER;
+   ring_info->priv_read_index %= dsize;
 }
 
 /*
-- 
1.7.4.1

[PATCH 3/4] Drivers: hv: ring_buffer: use wrap around mappings in hv_copy{from,to}_ringbuffer()

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

With wrap around mappings for ring buffers we can always use a single
memcpy() to do the job.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/ring_buffer.c |   24 +++-
 1 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 7e21c2c..08043da 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -172,18 +172,7 @@ static u32 hv_copyfrom_ringbuffer(
void *ring_buffer = hv_get_ring_buffer(ring_info);
u32 ring_buffer_size = hv_get_ring_buffersize(ring_info);
 
-   u32 frag_len;
-
-   /* wrap-around detected at the src */
-   if (destlen > ring_buffer_size - start_read_offset) {
-   frag_len = ring_buffer_size - start_read_offset;
-
-   memcpy(dest, ring_buffer + start_read_offset, frag_len);
-   memcpy(dest + frag_len, ring_buffer, destlen - frag_len);
-   } else
-
-   memcpy(dest, ring_buffer + start_read_offset, destlen);
-
+   memcpy(dest, ring_buffer + start_read_offset, destlen);
 
start_read_offset += destlen;
start_read_offset %= ring_buffer_size;
@@ -204,15 +193,8 @@ static u32 hv_copyto_ringbuffer(
 {
void *ring_buffer = hv_get_ring_buffer(ring_info);
u32 ring_buffer_size = hv_get_ring_buffersize(ring_info);
-   u32 frag_len;
-
-   /* wrap-around detected! */
-   if (srclen > ring_buffer_size - start_write_offset) {
-   frag_len = ring_buffer_size - start_write_offset;
-   memcpy(ring_buffer + start_write_offset, src, frag_len);
-   memcpy(ring_buffer, src + frag_len, srclen - frag_len);
-   } else
-   memcpy(ring_buffer + start_write_offset, src, srclen);
+
+   memcpy(ring_buffer + start_write_offset, src, srclen);
 
start_write_offset += srclen;
start_write_offset %= ring_buffer_size;
-- 
1.7.4.1

[PATCH 3/4] Drivers: hv: ring_buffer: use wrap around mappings in hv_copy{from,to}_ringbuffer()

2016-07-06 Thread kys

From: Vitaly Kuznetsov 

With wrap around mappings for ring buffers we can always use a single
memcpy() to do the job.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
Tested-by: Dexuan Cui 
---
 drivers/hv/ring_buffer.c |   24 +++-
 1 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 7e21c2c..08043da 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -172,18 +172,7 @@ static u32 hv_copyfrom_ringbuffer(
void *ring_buffer = hv_get_ring_buffer(ring_info);
u32 ring_buffer_size = hv_get_ring_buffersize(ring_info);
 
-   u32 frag_len;
-
-   /* wrap-around detected at the src */
-   if (destlen > ring_buffer_size - start_read_offset) {
-   frag_len = ring_buffer_size - start_read_offset;
-
-   memcpy(dest, ring_buffer + start_read_offset, frag_len);
-   memcpy(dest + frag_len, ring_buffer, destlen - frag_len);
-   } else
-
-   memcpy(dest, ring_buffer + start_read_offset, destlen);
-
+   memcpy(dest, ring_buffer + start_read_offset, destlen);
 
start_read_offset += destlen;
start_read_offset %= ring_buffer_size;
@@ -204,15 +193,8 @@ static u32 hv_copyto_ringbuffer(
 {
void *ring_buffer = hv_get_ring_buffer(ring_info);
u32 ring_buffer_size = hv_get_ring_buffersize(ring_info);
-   u32 frag_len;
-
-   /* wrap-around detected! */
-   if (srclen > ring_buffer_size - start_write_offset) {
-   frag_len = ring_buffer_size - start_write_offset;
-   memcpy(ring_buffer + start_write_offset, src, frag_len);
-   memcpy(ring_buffer, src + frag_len, srclen - frag_len);
-   } else
-   memcpy(ring_buffer + start_write_offset, src, srclen);
+
+   memcpy(ring_buffer + start_write_offset, src, srclen);
 
start_write_offset += srclen;
start_write_offset %= ring_buffer_size;
-- 
1.7.4.1

[PATCH 0/4] Drivers: hv: vmbus: Make in-place consumption always possible

2016-07-06 Thread kys

From: K. Y. Srinivasan 

Make in-place consumption of VMBus packets always possible. Currently we forbid
it when a packet 'wraps around' the ring so we can't provide a single
pointer to it.

The idea if this series is dead simple: let's make a single virtual mapping
for two copies (actually, two sets of pages which consist the ring buffer)
of the ring buffer. With such a mapping we can always provide a pointers
for in-place consumption to drivers. Copy path can also benefit from such
mappings as we eliminate the need for conditional checking in copy_to/
copy_from functions and use a single memcpy().

Vitaly Kuznetsov (4):
  Drivers: hv: cleanup vmbus_open() for wrap around mappings
  Drivers: hv: ring_buffer: wrap around mappings for ring buffers
  Drivers: hv: ring_buffer: use wrap around mappings in
hv_copy{from,to}_ringbuffer()
  Drivers: hv: ring_buffer: count on wrap around mappings in
get_next_pkt_raw()

 drivers/hv/channel.c  |   68 +++--
 drivers/hv/hyperv_vmbus.h |4 +-
 drivers/hv/ring_buffer.c  |   61 +++-
 include/linux/hyperv.h|   32 +++--
 4 files changed, 83 insertions(+), 82 deletions(-)

-- 
1.7.4.1

[PATCH 0/4] Drivers: hv: vmbus: Make in-place consumption always possible

2016-07-06 Thread kys

From: K. Y. Srinivasan 

Make in-place consumption of VMBus packets always possible. Currently we forbid
it when a packet 'wraps around' the ring so we can't provide a single
pointer to it.

The idea if this series is dead simple: let's make a single virtual mapping
for two copies (actually, two sets of pages which consist the ring buffer)
of the ring buffer. With such a mapping we can always provide a pointers
for in-place consumption to drivers. Copy path can also benefit from such
mappings as we eliminate the need for conditional checking in copy_to/
copy_from functions and use a single memcpy().

Vitaly Kuznetsov (4):
  Drivers: hv: cleanup vmbus_open() for wrap around mappings
  Drivers: hv: ring_buffer: wrap around mappings for ring buffers
  Drivers: hv: ring_buffer: use wrap around mappings in
hv_copy{from,to}_ringbuffer()
  Drivers: hv: ring_buffer: count on wrap around mappings in
get_next_pkt_raw()

 drivers/hv/channel.c  |   68 +++--
 drivers/hv/hyperv_vmbus.h |4 +-
 drivers/hv/ring_buffer.c  |   61 +++-
 include/linux/hyperv.h|   32 +++--
 4 files changed, 83 insertions(+), 82 deletions(-)

-- 
1.7.4.1

[PATCH RESEND net-next] netvsc: Use the new in-place consumption APIs in the rx path

2016-07-05 Thread kys

From: K. Y. Srinivasan 

Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.

The patch is being resent to address earlier email issues.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/netvsc.c |   88 +--
 1 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..8cd4c19 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1141,6 +1141,39 @@ static inline void netvsc_receive_inband(struct 
hv_device *hdev,
}
 }
 
+static void netvsc_process_raw_pkt(struct hv_device *device,
+  struct vmbus_channel *channel,
+  struct netvsc_device *net_device,
+  struct net_device *ndev,
+  u64 request_id,
+  struct vmpacket_descriptor *desc)
+{
+   struct nvsp_message *nvmsg;
+
+   nvmsg = (struct nvsp_message *)((unsigned long)
+   desc + (desc->offset8 << 3));
+
+   switch (desc->type) {
+   case VM_PKT_COMP:
+   netvsc_send_completion(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_USING_XFER_PAGES:
+   netvsc_receive(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_INBAND:
+   netvsc_receive_inband(device, net_device, nvmsg);
+   break;
+
+   default:
+   netdev_err(ndev, "unhandled packet type %d, tid %llx\n",
+  desc->type, request_id);
+   break;
+   }
+}
+
+
 void netvsc_channel_cb(void *context)
 {
int ret;
@@ -1153,7 +1186,7 @@ void netvsc_channel_cb(void *context)
unsigned char *buffer;
int bufferlen = NETVSC_PACKET_SIZE;
struct net_device *ndev;
-   struct nvsp_message *nvmsg;
+   bool need_to_commit = false;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1167,39 +1200,36 @@ void netvsc_channel_cb(void *context)
buffer = get_per_channel_state(channel);
 
do {
+   desc = get_next_pkt_raw(channel);
+   if (desc != NULL) {
+   netvsc_process_raw_pkt(device,
+  channel,
+  net_device,
+  ndev,
+  desc->trans_id,
+  desc);
+
+   put_pkt_raw(channel, desc);
+   need_to_commit = true;
+   continue;
+   }
+   if (need_to_commit) {
+   need_to_commit = false;
+   commit_rd_index(channel);
+   }
+
ret = vmbus_recvpacket_raw(channel, buffer, bufferlen,
   _recvd, _id);
if (ret == 0) {
if (bytes_recvd > 0) {
desc = (struct vmpacket_descriptor *)buffer;
-   nvmsg = (struct nvsp_message *)((unsigned long)
-desc + (desc->offset8 << 3));
-   switch (desc->type) {
-   case VM_PKT_COMP:
-   netvsc_send_completion(net_device,
-   channel,
-   device, desc);
-   break;
-
-   case VM_PKT_DATA_USING_XFER_PAGES:
-   netvsc_receive(net_device, channel,
-  device, desc);
-   break;
-
-   case VM_PKT_DATA_INBAND:
-   netvsc_receive_inband(device,
- net_device,
- nvmsg);
-   break;
-
-   default:
-   netdev_err(ndev,
-  "unhandled packet type %d, "
-  "tid %llx len %d\n",
-  desc->type, request_id,
-  bytes_recvd);

[PATCH RESEND net-next] netvsc: Use the new in-place consumption APIs in the rx path

2016-07-05 Thread kys

From: K. Y. Srinivasan 

Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.

The patch is being resent to address earlier email issues.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/netvsc.c |   88 +--
 1 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..8cd4c19 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1141,6 +1141,39 @@ static inline void netvsc_receive_inband(struct 
hv_device *hdev,
}
 }
 
+static void netvsc_process_raw_pkt(struct hv_device *device,
+  struct vmbus_channel *channel,
+  struct netvsc_device *net_device,
+  struct net_device *ndev,
+  u64 request_id,
+  struct vmpacket_descriptor *desc)
+{
+   struct nvsp_message *nvmsg;
+
+   nvmsg = (struct nvsp_message *)((unsigned long)
+   desc + (desc->offset8 << 3));
+
+   switch (desc->type) {
+   case VM_PKT_COMP:
+   netvsc_send_completion(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_USING_XFER_PAGES:
+   netvsc_receive(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_INBAND:
+   netvsc_receive_inband(device, net_device, nvmsg);
+   break;
+
+   default:
+   netdev_err(ndev, "unhandled packet type %d, tid %llx\n",
+  desc->type, request_id);
+   break;
+   }
+}
+
+
 void netvsc_channel_cb(void *context)
 {
int ret;
@@ -1153,7 +1186,7 @@ void netvsc_channel_cb(void *context)
unsigned char *buffer;
int bufferlen = NETVSC_PACKET_SIZE;
struct net_device *ndev;
-   struct nvsp_message *nvmsg;
+   bool need_to_commit = false;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1167,39 +1200,36 @@ void netvsc_channel_cb(void *context)
buffer = get_per_channel_state(channel);
 
do {
+   desc = get_next_pkt_raw(channel);
+   if (desc != NULL) {
+   netvsc_process_raw_pkt(device,
+  channel,
+  net_device,
+  ndev,
+  desc->trans_id,
+  desc);
+
+   put_pkt_raw(channel, desc);
+   need_to_commit = true;
+   continue;
+   }
+   if (need_to_commit) {
+   need_to_commit = false;
+   commit_rd_index(channel);
+   }
+
ret = vmbus_recvpacket_raw(channel, buffer, bufferlen,
   _recvd, _id);
if (ret == 0) {
if (bytes_recvd > 0) {
desc = (struct vmpacket_descriptor *)buffer;
-   nvmsg = (struct nvsp_message *)((unsigned long)
-desc + (desc->offset8 << 3));
-   switch (desc->type) {
-   case VM_PKT_COMP:
-   netvsc_send_completion(net_device,
-   channel,
-   device, desc);
-   break;
-
-   case VM_PKT_DATA_USING_XFER_PAGES:
-   netvsc_receive(net_device, channel,
-  device, desc);
-   break;
-
-   case VM_PKT_DATA_INBAND:
-   netvsc_receive_inband(device,
- net_device,
- nvmsg);
-   break;
-
-   default:
-   netdev_err(ndev,
-  "unhandled packet type %d, "
-  "tid %llx len %d\n",
-  desc->type, request_id,
-  bytes_recvd);
-

[PATCH 2/3] Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()

2016-07-01 Thread kys

From: K. Y. Srinivasan 

The current delay between retries is unnecessarily high and is negatively
affecting the time it takes to boot the system.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/connection.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index fcf8a02..78e6368 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -439,7 +439,7 @@ int vmbus_post_msg(void *buffer, size_t buflen)
union hv_connection_id conn_id;
int ret = 0;
int retries = 0;
-   u32 msec = 1;
+   u32 usec = 1;
 
conn_id.asu32 = 0;
conn_id.u.id = VMBUS_MESSAGE_CONNECTION_ID;
@@ -472,9 +472,9 @@ int vmbus_post_msg(void *buffer, size_t buflen)
}
 
retries++;
-   msleep(msec);
-   if (msec < 2048)
-   msec *= 2;
+   udelay(usec);
+   if (usec < 2048)
+   usec *= 2;
}
return ret;
 }
-- 
1.7.4.1

[PATCH 1/3] Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels

2016-07-01 Thread kys

From: K. Y. Srinivasan 

For synthetic NIC channels, enable explicit signaling policy as netvsc wants to
explicitly control when the host is to be signaled.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c  |   18 --
 drivers/hv/channel_mgmt.c |2 ++
 drivers/hv/hyperv_vmbus.h |3 ++-
 drivers/hv/ring_buffer.c  |   15 ---
 4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index a68830c..da022d3 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -657,7 +657,7 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, num_vecs,
- , lock);
+ , lock, channel->signal_policy);
 
/*
 * Signalling the host is conditional on many factors:
@@ -678,11 +678,6 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
 * mechanism which can hurt the performance otherwise.
 */
 
-   if (channel->signal_policy)
-   signal = true;
-   else
-   kick_q = true;
-
if (((ret == 0) && kick_q && signal) ||
(ret && !is_hvsock_channel(channel)))
vmbus_setevent(channel);
@@ -775,7 +770,7 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
/*
 * Signalling the host is conditional on many factors:
@@ -793,11 +788,6 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
 * enough condition that it should not matter.
 */
 
-   if (channel->signal_policy)
-   signal = true;
-   else
-   kick_q = true;
-
if (((ret == 0) && kick_q && signal) || (ret))
vmbus_setevent(channel);
 
@@ -859,7 +849,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
if (ret == 0 && signal)
vmbus_setevent(channel);
@@ -924,7 +914,7 @@ int vmbus_sendpacket_multipagebuffer(struct vmbus_channel 
*channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
if (ret == 0 && signal)
vmbus_setevent(channel);
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b6c1211..8345869 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -406,6 +406,8 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
}
 
dev_type = hv_get_dev_type(>offermsg.offer.if_type);
+   if (dev_type == HV_NIC)
+   set_channel_signal_state(newchannel, HV_SIGNAL_POLICY_EXPLICIT);
 
init_vp_index(newchannel, dev_type);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index dfa9fac..ddcc348 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -529,7 +529,8 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info 
*ring_info);
 
 int hv_ringbuffer_write(struct hv_ring_buffer_info *ring_info,
struct kvec *kv_list,
-   u32 kv_count, bool *signal, bool lock);
+   u32 kv_count, bool *signal, bool lock,
+   enum hv_signal_policy policy);
 
 int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
   void *buffer, u32 buflen, u32 *buffer_actual_len,
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index fe586bf..e3edcae 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -66,12 +66,20 @@ u32 hv_end_read(struct hv_ring_buffer_info *rbi)
  *arrived.
  */
 
-static bool hv_need_to_signal(u32 old_write, struct hv_ring_buffer_info *rbi)
+static bool hv_need_to_signal(u32 old_write, struct hv_ring_buffer_info *rbi,
+ enum hv_signal_policy policy)
 {
virt_mb();
if (READ_ONCE(rbi->ring_buffer->interrupt_mask))
return false;
 
+   /*
+* When the client wants to control signaling,
+* we only honour the host interrupt mask.
+*/
+   if (policy == HV_SIGNAL_POLICY_EXPLICIT)
+   return true;
+
/* check interrupt_mask before read_index */

[PATCH 2/3] Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()

2016-07-01 Thread kys

From: K. Y. Srinivasan 

The current delay between retries is unnecessarily high and is negatively
affecting the time it takes to boot the system.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/connection.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index fcf8a02..78e6368 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -439,7 +439,7 @@ int vmbus_post_msg(void *buffer, size_t buflen)
union hv_connection_id conn_id;
int ret = 0;
int retries = 0;
-   u32 msec = 1;
+   u32 usec = 1;
 
conn_id.asu32 = 0;
conn_id.u.id = VMBUS_MESSAGE_CONNECTION_ID;
@@ -472,9 +472,9 @@ int vmbus_post_msg(void *buffer, size_t buflen)
}
 
retries++;
-   msleep(msec);
-   if (msec < 2048)
-   msec *= 2;
+   udelay(usec);
+   if (usec < 2048)
+   usec *= 2;
}
return ret;
 }
-- 
1.7.4.1

[PATCH 1/3] Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels

2016-07-01 Thread kys

From: K. Y. Srinivasan 

For synthetic NIC channels, enable explicit signaling policy as netvsc wants to
explicitly control when the host is to be signaled.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c  |   18 --
 drivers/hv/channel_mgmt.c |2 ++
 drivers/hv/hyperv_vmbus.h |3 ++-
 drivers/hv/ring_buffer.c  |   15 ---
 4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index a68830c..da022d3 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -657,7 +657,7 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, num_vecs,
- , lock);
+ , lock, channel->signal_policy);
 
/*
 * Signalling the host is conditional on many factors:
@@ -678,11 +678,6 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
 * mechanism which can hurt the performance otherwise.
 */
 
-   if (channel->signal_policy)
-   signal = true;
-   else
-   kick_q = true;
-
if (((ret == 0) && kick_q && signal) ||
(ret && !is_hvsock_channel(channel)))
vmbus_setevent(channel);
@@ -775,7 +770,7 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
/*
 * Signalling the host is conditional on many factors:
@@ -793,11 +788,6 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
 * enough condition that it should not matter.
 */
 
-   if (channel->signal_policy)
-   signal = true;
-   else
-   kick_q = true;
-
if (((ret == 0) && kick_q && signal) || (ret))
vmbus_setevent(channel);
 
@@ -859,7 +849,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
if (ret == 0 && signal)
vmbus_setevent(channel);
@@ -924,7 +914,7 @@ int vmbus_sendpacket_multipagebuffer(struct vmbus_channel 
*channel,
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
ret = hv_ringbuffer_write(>outbound, bufferlist, 3,
- , lock);
+ , lock, channel->signal_policy);
 
if (ret == 0 && signal)
vmbus_setevent(channel);
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b6c1211..8345869 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -406,6 +406,8 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
}
 
dev_type = hv_get_dev_type(>offermsg.offer.if_type);
+   if (dev_type == HV_NIC)
+   set_channel_signal_state(newchannel, HV_SIGNAL_POLICY_EXPLICIT);
 
init_vp_index(newchannel, dev_type);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index dfa9fac..ddcc348 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -529,7 +529,8 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info 
*ring_info);
 
 int hv_ringbuffer_write(struct hv_ring_buffer_info *ring_info,
struct kvec *kv_list,
-   u32 kv_count, bool *signal, bool lock);
+   u32 kv_count, bool *signal, bool lock,
+   enum hv_signal_policy policy);
 
 int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
   void *buffer, u32 buflen, u32 *buffer_actual_len,
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index fe586bf..e3edcae 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -66,12 +66,20 @@ u32 hv_end_read(struct hv_ring_buffer_info *rbi)
  *arrived.
  */
 
-static bool hv_need_to_signal(u32 old_write, struct hv_ring_buffer_info *rbi)
+static bool hv_need_to_signal(u32 old_write, struct hv_ring_buffer_info *rbi,
+ enum hv_signal_policy policy)
 {
virt_mb();
if (READ_ONCE(rbi->ring_buffer->interrupt_mask))
return false;
 
+   /*
+* When the client wants to control signaling,
+* we only honour the host interrupt mask.
+*/
+   if (policy == HV_SIGNAL_POLICY_EXPLICIT)
+   return true;
+
/* check interrupt_mask before read_index */
virt_rmb();
/*
@@ -264,7

[PATCH 3/3] Drivers: hv: vmbus: Implement a mechanism to tag the channel for low latency

2016-07-01 Thread kys

From: K. Y. Srinivasan 

On Hyper-V, performance critical channels use the monitor
mechanism to signal the host when the guest posts mesages
for the host. This mechanism minimizes the hypervisor intercepts
and also makes the host more efficient in that each time the
host is woken up, it processes a batch of messages as opposed to
just one. The goal here is improve the throughput and this is at
the expense of increased latency.
Implement a mechanism to let the client driver decide if latency
is important.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c   |7 ++-
 include/linux/hyperv.h |   35 +++
 2 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index da022d3..e3a0048 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -43,7 +43,12 @@ static void vmbus_setevent(struct vmbus_channel *channel)
 {
struct hv_monitor_page *monitorpage;
 
-   if (channel->offermsg.monitor_allocated) {
+   /*
+* For channels marked as in "low latency" mode
+* bypass the monitor page mechanism.
+*/
+   if ((channel->offermsg.monitor_allocated) &&
+   (!channel->low_latency)) {
/* Each u32 represents 32 channels */
sync_set_bit(channel->offermsg.child_relid & 31,
(unsigned long *) vmbus_connection.send_int_page +
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b10954a..362acf0 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -850,6 +850,31 @@ struct vmbus_channel {
 * ring lock to preserve the current behavior.
 */
bool acquire_ring_lock;
+   /*
+* For performance critical channels (storage, networking
+* etc,), Hyper-V has a mechanism to enhance the throughput
+* at the expense of latency:
+* When the host is to be signaled, we just set a bit in a shared page
+* and this bit will be inspected by the hypervisor within a certain
+* window and if the bit is set, the host will be signaled. The window
+* of time is the monitor latency - currently around 100 usecs. This
+* mechanism improves throughput by:
+*
+* A) Making the host more efficient - each time it wakes up,
+*potentially it will process morev number of packets. The
+*monitor latency allows a batch to build up.
+* B) By deferring the hypercall to signal, we will also minimize
+*the interrupts.
+*
+* Clearly, these optimizations improve throughput at the expense of
+* latency. Furthermore, since the channel is shared for both
+* control and data messages, control messages currently suffer
+* unnecessary latency adversley impacting performance and boot
+* time. To fix this issue, permit tagging the channel as being
+* in "low latency" mode. In this mode, we will bypass the monitor
+* mechanism.
+*/
+   bool low_latency;
 
 };
 
@@ -891,6 +916,16 @@ static inline void set_channel_pending_send_size(struct 
vmbus_channel *c,
c->outbound.ring_buffer->pending_send_sz = size;
 }
 
+static inline void set_low_latency_mode(struct vmbus_channel *c)
+{
+   c->low_latency = true;
+}
+
+static inline void clear_low_latency_mode(struct vmbus_channel *c)
+{
+   c->low_latency = false;
+}
+
 void vmbus_onmessage(void *context);
 
 int vmbus_request_offers(void);
-- 
1.7.4.1

[PATCH 3/3] Drivers: hv: vmbus: Implement a mechanism to tag the channel for low latency

2016-07-01 Thread kys

From: K. Y. Srinivasan 

On Hyper-V, performance critical channels use the monitor
mechanism to signal the host when the guest posts mesages
for the host. This mechanism minimizes the hypervisor intercepts
and also makes the host more efficient in that each time the
host is woken up, it processes a batch of messages as opposed to
just one. The goal here is improve the throughput and this is at
the expense of increased latency.
Implement a mechanism to let the client driver decide if latency
is important.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c   |7 ++-
 include/linux/hyperv.h |   35 +++
 2 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index da022d3..e3a0048 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -43,7 +43,12 @@ static void vmbus_setevent(struct vmbus_channel *channel)
 {
struct hv_monitor_page *monitorpage;
 
-   if (channel->offermsg.monitor_allocated) {
+   /*
+* For channels marked as in "low latency" mode
+* bypass the monitor page mechanism.
+*/
+   if ((channel->offermsg.monitor_allocated) &&
+   (!channel->low_latency)) {
/* Each u32 represents 32 channels */
sync_set_bit(channel->offermsg.child_relid & 31,
(unsigned long *) vmbus_connection.send_int_page +
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b10954a..362acf0 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -850,6 +850,31 @@ struct vmbus_channel {
 * ring lock to preserve the current behavior.
 */
bool acquire_ring_lock;
+   /*
+* For performance critical channels (storage, networking
+* etc,), Hyper-V has a mechanism to enhance the throughput
+* at the expense of latency:
+* When the host is to be signaled, we just set a bit in a shared page
+* and this bit will be inspected by the hypervisor within a certain
+* window and if the bit is set, the host will be signaled. The window
+* of time is the monitor latency - currently around 100 usecs. This
+* mechanism improves throughput by:
+*
+* A) Making the host more efficient - each time it wakes up,
+*potentially it will process morev number of packets. The
+*monitor latency allows a batch to build up.
+* B) By deferring the hypercall to signal, we will also minimize
+*the interrupts.
+*
+* Clearly, these optimizations improve throughput at the expense of
+* latency. Furthermore, since the channel is shared for both
+* control and data messages, control messages currently suffer
+* unnecessary latency adversley impacting performance and boot
+* time. To fix this issue, permit tagging the channel as being
+* in "low latency" mode. In this mode, we will bypass the monitor
+* mechanism.
+*/
+   bool low_latency;
 
 };
 
@@ -891,6 +916,16 @@ static inline void set_channel_pending_send_size(struct 
vmbus_channel *c,
c->outbound.ring_buffer->pending_send_sz = size;
 }
 
+static inline void set_low_latency_mode(struct vmbus_channel *c)
+{
+   c->low_latency = true;
+}
+
+static inline void clear_low_latency_mode(struct vmbus_channel *c)
+{
+   c->low_latency = false;
+}
+
 void vmbus_onmessage(void *context);
 
 int vmbus_request_offers(void);
-- 
1.7.4.1

[PATCH 0/3] Drivers: hv: vmbus: Miscellaneous adjustments

2016-07-01 Thread kys

From: K. Y. Srinivasan 

Some miscellaneous adjustments to the vmbus driver.

K. Y. Srinivasan (3):
  Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels
  Drivers: hv: vmbus: Reduce the delay between retries in
vmbus_post_msg()
  Drivers: hv: vmbus: Implement a mechanism to tag the channel for low
latency

 drivers/hv/channel.c  |   25 ++---
 drivers/hv/channel_mgmt.c |2 ++
 drivers/hv/connection.c   |8 
 drivers/hv/hyperv_vmbus.h |3 ++-
 drivers/hv/ring_buffer.c  |   15 ---
 include/linux/hyperv.h|   35 +++
 6 files changed, 65 insertions(+), 23 deletions(-)

-- 
1.7.4.1

[PATCH 0/3] Drivers: hv: vmbus: Miscellaneous adjustments

2016-07-01 Thread kys

From: K. Y. Srinivasan 

Some miscellaneous adjustments to the vmbus driver.

K. Y. Srinivasan (3):
  Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels
  Drivers: hv: vmbus: Reduce the delay between retries in
vmbus_post_msg()
  Drivers: hv: vmbus: Implement a mechanism to tag the channel for low
latency

 drivers/hv/channel.c  |   25 ++---
 drivers/hv/channel_mgmt.c |2 ++
 drivers/hv/connection.c   |8 
 drivers/hv/hyperv_vmbus.h |3 ++-
 drivers/hv/ring_buffer.c  |   15 ---
 include/linux/hyperv.h|   35 +++
 6 files changed, 65 insertions(+), 23 deletions(-)

-- 
1.7.4.1

< 4 5 6 7 8 9

801 - 844 of 844 matches

Mail list logo