RE: [PATCH v4 1/1] crypto: add virtio-crypto driver

2016-11-30 Thread Gonglei (Arei)
Hi Stefan,

> 
> On Tue, Nov 29, 2016 at 08:48:14PM +0800, Gonglei wrote:
> > diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> > new file mode 100644
> > index 000..08b077f
> > --- /dev/null
> > +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> > @@ -0,0 +1,518 @@
> > + /* Algorithms supported by virtio crypto device
> > +  *
> > +  * Authors: Gonglei 
> > +  *
> > +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > +  *
> > +  * This program is free software; you can redistribute it and/or modify
> > +  * it under the terms of the GNU General Public License as published by
> > +  * the Free Software Foundation; either version 2 of the License, or
> > +  * (at your option) any later version.
> > +  *
> > +  * This program is distributed in the hope that it will be useful,
> > +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +  * GNU General Public License for more details.
> > +  *
> > +  * You should have received a copy of the GNU General Public License
> > +  * along with this program; if not, see .
> > +  */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include "virtio_crypto_common.h"
> > +
> > +static DEFINE_MUTEX(algs_lock);
> 
> Did you run checkpatch.pl?  I think it encourages you to document what
> the lock protects.
> 
Sure. Basically I run checkpatch.py each time. :)

# ./scripts/checkpatch.pl 0001-crypto-add-virtio-crypto-driver.patch 
total: 0 errors, 0 warnings, 1873 lines checked

0001-crypto-add-virtio-crypto-driver.patch has no obvious style problems and is 
ready for submission.

> > +static int virtio_crypto_alg_ablkcipher_init_session(
> > +   struct virtio_crypto_ablkcipher_ctx *ctx,
> > +   uint32_t alg, const uint8_t *key,
> > +   unsigned int keylen,
> > +   int encrypt)
> > +{
> > +   struct scatterlist outhdr, key_sg, inhdr, *sgs[3];
> > +   unsigned int tmp;
> > +   struct virtio_crypto *vcrypto = ctx->vcrypto;
> > +   int op = encrypt ? VIRTIO_CRYPTO_OP_ENCRYPT :
> VIRTIO_CRYPTO_OP_DECRYPT;
> > +   int err;
> > +   unsigned int num_out = 0, num_in = 0;
> > +
> > +   /*
> > +* Avoid to do DMA from the stack, switch to using
> > +* dynamically-allocated for the key
> > +*/
> > +   uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
> > +
> > +   if (!cipher_key)
> > +   return -ENOMEM;
> > +
> > +   memcpy(cipher_key, key, keylen);
> 
> Are there any rules on handling key material in the kernel?  This buffer
> is just kfreed later.  Do you need to zero it out before freeing it?
> 
Good questions. For kernel crypto core, each cipher request should be freed
by skcipher_request_free(): zeroize and free request data structure.

I need to use kzfree() for key as well. I'll also check other stuffs. Thanks. 

> > +
> > +   spin_lock(>ctrl_lock);
> 
> The QAT accelerator driver doesn't spin while talking to the device in
> virtio_crypto_alg_ablkcipher_init_session().  I didn't find any other
> driver examples in the kernel tree, but this function seems like a
> weakness in the virtio-crypto device.
> 
The control queues of virtio-net and virtio-console are also be locked
Please see:
 __send_control_msg() in virtio_console.c and virtio-net's control queue
protected by rtnl lock.

I didn't want to protect session creations but the virtqueue's operations
like what other virtio devices do.

> While QEMU is servicing the create session command this vcpu is blocked.
> The QEMU global mutex is held so no other vcpu can enter QEMU and the
> QMP monitor is also blocked.
> 
> This is a scalability and performance problem.  Can you look at how QAT
> avoids this synchronous session setup?

For QAT driver, the session creation is synchronous as well because it's a
plain software operation which can be completed ASAP.

Regards,
-Gonglei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info

2016-11-30 Thread Dave Hansen
On 11/30/2016 12:43 AM, Liang Li wrote:
> +static void send_unused_pages_info(struct virtio_balloon *vb,
> + unsigned long req_id)
> +{
> + struct scatterlist sg_in;
> + unsigned long pos = 0;
> + struct virtqueue *vq = vb->req_vq;
> + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
> + int ret, order;
> +
> + mutex_lock(>balloon_lock);
> +
> + for (order = MAX_ORDER - 1; order >= 0; order--) {

I scratched my head for a bit on this one.  Why are you walking over
orders, *then* zones.  I *think* you're doing it because you can
efficiently fill the bitmaps at a given order for all zones, then move
to a new bitmap.  But, it would be interesting to document this.

> + pos = 0;
> + ret = get_unused_pages(vb->resp_data,
> +  vb->resp_buf_size / sizeof(unsigned long),
> +  order, );

FWIW, get_unsued_pages() is a pretty bad name.  "get" usually implies
bumping reference counts or consuming something.  You're just
"recording" or "marking" them.

> + if (ret == -ENOSPC) {
> + void *new_resp_data;
> +
> + new_resp_data = kmalloc(2 * vb->resp_buf_size,
> + GFP_KERNEL);
> + if (new_resp_data) {
> + kfree(vb->resp_data);
> + vb->resp_data = new_resp_data;
> + vb->resp_buf_size *= 2;

What happens to the data in ->resp_data at this point?  Doesn't this
just throw it away?

...
> +struct page_info_item {
> + __le64 start_pfn : 52; /* start pfn for the bitmap */
> + __le64 page_shift : 6; /* page shift width, in bytes */
> + __le64 bmap_len : 6;  /* bitmap length, in bytes */
> +};

Is 'bmap_len' too short?  a 64-byte buffer is a bit tiny.  Right?

> +static int  mark_unused_pages(struct zone *zone,
> + unsigned long *unused_pages, unsigned long size,
> + int order, unsigned long *pos)
> +{
> + unsigned long pfn, flags;
> + unsigned int t;
> + struct list_head *curr;
> + struct page_info_item *info;
> +
> + if (zone_is_empty(zone))
> + return 0;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + if (*pos + zone->free_area[order].nr_free > size)
> + return -ENOSPC;

Urg, so this won't partially fill?  So, what the nr_free pages limit
where we no longer fit in the kmalloc()'d buffer where this simply won't
work?

> + for (t = 0; t < MIGRATE_TYPES; t++) {
> + list_for_each(curr, >free_area[order].free_list[t]) {
> + pfn = page_to_pfn(list_entry(curr, struct page, lru));
> + info = (struct page_info_item *)(unused_pages + *pos);
> + info->start_pfn = pfn;
> + info->page_shift = order + PAGE_SHIFT;
> + *pos += 1;
> + }
> + }

Do we need to fill in ->bmap_len here?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


WorldCIST'2017 - Workshops submission deadline - December 8

2016-11-30 Thread ML
-
WORKSHOPS
WorldCIST'17 - 5th World Conference on Information Systems and Technologies 
Porto Santo Island, Madeira, Portugal
11th-13th of April 2017
http://www.worldcist.org/
-

WorldCIST 2017 will feature a total of 18 Workshops. Paper submission for all 
Workshops must be performed at 
https://easychair.org/conferences/?conf=worldcist_workshops2017 selecting the 
desired Workshop. Workshop papers (Full – 10 Pages and Short – 7 Pages) will be 
published by Springer AISC series and the authors of the best Workshop paper 
will be invited to extend their work for publication at top International 
Journals (indexed by ISI Web of Knowledge and SCOPUS). Paper submission is open 
until December 8th for all Workshops.

WORKSHOPS
•BIO - Business Intelligence in Organizations
•CMAIPA - Computational Methods and Applications for Image Processing and 
Analysis
•CSQA - Computer Supported Qualitative Analysis
•ESG - Educational and Serious Games
•ETCBPM - Emerging Trends and Challenges in Business Process Management
•HISISE - Workshop on Healthcare Information Systems Interoperability, 
Security and Efficiency
•HMInARMM - Human-Machine Interfaces in Automation, Robotics, Mechanics and 
Mechatronics
•ICDSS - Intelligent and Collaborative Decision Support Systems for 
Improving Manufacturing Processes
•ICTwithUAV - ICT solutions with Unmanned Aerial Vehicles
•IoT4Health - Workshop on Internet of Things for Health
•ISM - Intelligent Systems and Machines
•ISTA - Information Systems and Technologies Adoption
•MAMM - Managing Audiovisual Mass Media (governance, funding and 
innovation) and Mobile Journalism
•NPAT - New Pedagogical Approaches with Technologies
•PIS - Workshop on Pervasive Information Systems
•RSPPI - Resources Sharing between Private and Public Institutions
•SIdEWayS - Social Media World Sensors
•TinW - Technologies in the Workplace - Use and Impact on Workers

IMPORTANT DATES
• Deadline for paper submission: December 8th
• Notification of paper acceptance: December 28th, 2016
• Deadline for final versions and conference registration: January 8th, 2017
• Conference dates: April 11 -13, 2017

SUBMISSION AND PAPER FORMAT
Please Submit your paper at: 
https://easychair.org/conferences/?conf=worldcist_workshops2017
Two types of papers can be submitted to workshops (both will be published at 
the Springer AISC proceedings):
- Full papers: Finished or consolidated R works. These papers are assigned a 
10-page limit.
- Short papers: Finished or consolidated R works and also Ongoing work but 
with relevant preliminary results, open to discussion. These papers are 
assigned a 7-page limit.
Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website or 
download a DOC example) be written in English, must not have been published 
before, not be under review for any other conference, workshop or publication.
Paper should not include any information leading to the authors’ identification 
(in order to enable double blind review). Therefore, the authors’ names, 
affiliations and bibliographic references should not be included in the version 
for evaluation by the Program Committee. This information should only be 
included in the camera-ready version, saved in Word or Latex format and also in 
PDF format. These files must be accompanied by the Publication form filled out, 
in a ZIP file, and uploaded at the conference management system.
All papers will be subjected to a “double-blind review” by at least two/three 
members of the Program Committee. Based on Program Committee evaluation, a 
paper can be rejected or accepted by the Conference Chairs. In the latter case, 
it can be accepted as the type originally submitted or as another type. Thus, 
full papers can be accepted as short papers.

PUBLICATION AND INDEXING
Workshop papers will be published in the AISC Springer Conference Proceedings. 
To ensure that a paper is published in the Proceedings, at least one of the 
authors must be fully registered by 11th of January 2017, and the paper must 
comply with the suggested layout and page-limit. Additionally, all recommended 
changes must be addressed by the authors before they submit the camera-ready 
version. No more than one paper per registration will be published in the 
Conference Proceedings. An extra fee must be paid for publication of additional 
papers, with a maximum of one additional paper per registration. Full and short 
papers will be published in the Conference Proceedings by Springer, in Advances 
in Intelligent Systems and Computing. Published full and short papers will be 
submitted for indexation by ISI, EI-Compendex, SCOPUS and DBLP, among others, 
and will be available in the SpringerLink 

Re: [PATCH v4 1/1] crypto: add virtio-crypto driver

2016-11-30 Thread Stefan Hajnoczi
On Tue, Nov 29, 2016 at 08:48:14PM +0800, Gonglei wrote:
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> new file mode 100644
> index 000..08b077f
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -0,0 +1,518 @@
> + /* Algorithms supported by virtio crypto device
> +  *
> +  * Authors: Gonglei 
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under the terms of the GNU General Public License as published by
> +  * the Free Software Foundation; either version 2 of the License, or
> +  * (at your option) any later version.
> +  *
> +  * This program is distributed in the hope that it will be useful,
> +  * but WITHOUT ANY WARRANTY; without even the implied warranty of
> +  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +  * GNU General Public License for more details.
> +  *
> +  * You should have received a copy of the GNU General Public License
> +  * along with this program; if not, see .
> +  */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "virtio_crypto_common.h"
> +
> +static DEFINE_MUTEX(algs_lock);

Did you run checkpatch.pl?  I think it encourages you to document what
the lock protects.

> +static int virtio_crypto_alg_ablkcipher_init_session(
> + struct virtio_crypto_ablkcipher_ctx *ctx,
> + uint32_t alg, const uint8_t *key,
> + unsigned int keylen,
> + int encrypt)
> +{
> + struct scatterlist outhdr, key_sg, inhdr, *sgs[3];
> + unsigned int tmp;
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + int op = encrypt ? VIRTIO_CRYPTO_OP_ENCRYPT : VIRTIO_CRYPTO_OP_DECRYPT;
> + int err;
> + unsigned int num_out = 0, num_in = 0;
> +
> + /*
> +  * Avoid to do DMA from the stack, switch to using
> +  * dynamically-allocated for the key
> +  */
> + uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
> +
> + if (!cipher_key)
> + return -ENOMEM;
> +
> + memcpy(cipher_key, key, keylen);

Are there any rules on handling key material in the kernel?  This buffer
is just kfreed later.  Do you need to zero it out before freeing it?

> +
> + spin_lock(>ctrl_lock);

The QAT accelerator driver doesn't spin while talking to the device in
virtio_crypto_alg_ablkcipher_init_session().  I didn't find any other
driver examples in the kernel tree, but this function seems like a
weakness in the virtio-crypto device.

While QEMU is servicing the create session command this vcpu is blocked.
The QEMU global mutex is held so no other vcpu can enter QEMU and the
QMP monitor is also blocked.

This is a scalability and performance problem.  Can you look at how QAT
avoids this synchronous session setup?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info

2016-11-30 Thread Liang Li
This patch contains two parts:

One is to add a new API to mm go get the unused page information.
The virtio balloon driver will use this new API added to get the
unused page info and send it to hypervisor(QEMU) to speed up live
migration. During sending the bitmap, some the pages may be modified
and are used by the guest, this inaccuracy can be corrected by the
dirty page logging mechanism.

One is to add support the request for vm's unused page information,
QEMU can make use of unused page information and the dirty page
logging mechanism to skip the transportation of some of these unused
pages, this is very helpful to reduce the network traffic and speed
up the live migration process.

Signed-off-by: Liang Li 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 126 +---
 include/linux/mm.h  |   3 +-
 mm/page_alloc.c |  72 +++
 3 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c3ddec3..2626cc0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -56,7 +56,7 @@
 
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *req_vq;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -75,6 +75,8 @@ struct virtio_balloon {
void *resp_hdr;
/* Pointer to the start address of response data. */
unsigned long *resp_data;
+   /* Size of response data buffer. */
+   unsigned long resp_buf_size;
/* Pointer offset of the response data. */
unsigned long resp_pos;
/* Bitmap and bitmap count used to tell the host the pages */
@@ -83,6 +85,8 @@ struct virtio_balloon {
unsigned int nr_page_bmap;
/* Used to record the processed pfn range */
unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+   /* Request header */
+   struct virtio_balloon_req_hdr req_hdr;
/*
 * The pages we've told the Host we're not using are enqueued
 * at vb_dev_info->pages list.
@@ -551,6 +555,58 @@ static void update_balloon_stats(struct virtio_balloon *vb)
pages_to_bytes(available));
 }
 
+static void send_unused_pages_info(struct virtio_balloon *vb,
+   unsigned long req_id)
+{
+   struct scatterlist sg_in;
+   unsigned long pos = 0;
+   struct virtqueue *vq = vb->req_vq;
+   struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
+   int ret, order;
+
+   mutex_lock(>balloon_lock);
+
+   for (order = MAX_ORDER - 1; order >= 0; order--) {
+   pos = 0;
+   ret = get_unused_pages(vb->resp_data,
+vb->resp_buf_size / sizeof(unsigned long),
+order, );
+   if (ret == -ENOSPC) {
+   void *new_resp_data;
+
+   new_resp_data = kmalloc(2 * vb->resp_buf_size,
+   GFP_KERNEL);
+   if (new_resp_data) {
+   kfree(vb->resp_data);
+   vb->resp_data = new_resp_data;
+   vb->resp_buf_size *= 2;
+   order++;
+   continue;
+   } else
+   dev_warn(>vdev->dev,
+"%s: omit some %d order pages\n",
+__func__, order);
+   }
+
+   if (pos > 0) {
+   vb->resp_pos = pos;
+   hdr->cmd = BALLOON_GET_UNUSED_PAGES;
+   hdr->id = req_id;
+   if (order > 0)
+   hdr->flag = BALLOON_FLAG_CONT;
+   else
+   hdr->flag = BALLOON_FLAG_DONE;
+
+   send_resp_data(vb, vq, true);
+   }
+   }
+
+   mutex_unlock(>balloon_lock);
+   sg_init_one(_in, >req_hdr, sizeof(vb->req_hdr));
+   virtqueue_add_inbuf(vq, _in, 1, >req_hdr, GFP_KERNEL);
+   virtqueue_kick(vq);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the 
hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -685,18 +741,56 @@ static void update_balloon_size_func(struct work_struct 
*work)
  

[PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process

2016-11-30 Thread Liang Li
The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 395 +---
 1 file changed, 367 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f59cb4f..c3ddec3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,10 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define BALLOON_BMAP_SIZE  (8 * PAGE_SIZE)
+#define PFNS_PER_BMAP  (BALLOON_BMAP_SIZE * BITS_PER_BYTE)
+#define BALLOON_BMAP_COUNT 32
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -67,6 +71,18 @@ struct virtio_balloon {
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+   /* Pointer to the response header. */
+   void *resp_hdr;
+   /* Pointer to the start address of response data. */
+   unsigned long *resp_data;
+   /* Pointer offset of the response data. */
+   unsigned long resp_pos;
+   /* Bitmap and bitmap count used to tell the host the pages */
+   unsigned long *page_bitmap[BALLOON_BMAP_COUNT];
+   /* Number of split page bitmaps */
+   unsigned int nr_page_bmap;
+   /* Used to record the processed pfn range */
+   unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
/*
 * The pages we've told the Host we're not using are enqueued
 * at vb_dev_info->pages list.
@@ -110,20 +126,228 @@ static void balloon_ack(struct virtqueue *vq)
wake_up(>acked);
 }
 
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
+static inline void init_bmap_pfn_range(struct virtio_balloon *vb)
 {
-   struct scatterlist sg;
+   vb->min_pfn = ULONG_MAX;
+   vb->max_pfn = 0;
+}
+
+static inline void update_bmap_pfn_range(struct virtio_balloon *vb,
+struct page *page)
+{
+   unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+   vb->min_pfn = min(balloon_pfn, vb->min_pfn);
+   vb->max_pfn = max(balloon_pfn, vb->max_pfn);
+}
+
+static void extend_page_bitmap(struct virtio_balloon *vb,
+   unsigned long nr_pfn)
+{
+   int i, bmap_count;
+   unsigned long bmap_len;
+
+   bmap_len = ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE;
+   bmap_len = ALIGN(bmap_len, BALLOON_BMAP_SIZE);
+   bmap_count = min((int)(bmap_len / BALLOON_BMAP_SIZE),
+BALLOON_BMAP_COUNT);
+
+   for (i = 1; i < bmap_count; i++) {
+   vb->page_bitmap[i] = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
+   if (vb->page_bitmap[i])
+   vb->nr_page_bmap++;
+   else
+   break;
+   }
+}
+
+static void free_extended_page_bitmap(struct virtio_balloon *vb)
+{
+   int i, bmap_count = vb->nr_page_bmap;
+
+
+   for (i = 1; i < bmap_count; i++) {
+   kfree(vb->page_bitmap[i]);
+   vb->page_bitmap[i] = NULL;
+   vb->nr_page_bmap--;
+   }
+}
+
+static void kfree_page_bitmap(struct virtio_balloon *vb)
+{
+   int i;
+
+   for (i = 0; i < vb->nr_page_bmap; i++)
+   kfree(vb->page_bitmap[i]);
+}
+
+static void clear_page_bitmap(struct virtio_balloon *vb)
+{
+   int i;
+
+   for (i = 0; i < vb->nr_page_bmap; i++)
+   memset(vb->page_bitmap[i], 0, BALLOON_BMAP_SIZE);
+}
+
+static unsigned long 

[PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq

2016-11-30 Thread Liang Li
Define the flags and head struct for a new host request virtual
queue. Guest can get requests from host and then responds to them on
this new virtual queue.
Host can make use of this virtual queue to request the guest do some
operations, e.g. drop page cache, synchronize file system, etc.
And the hypervisor can get some of guest's runtime information
through this virtual queue too, e.g. the guest's unused page
information, which can be used for live migration optimization.

Signed-off-by: Liang Li 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 include/uapi/linux/virtio_balloon.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 1be4b1f..5ac3a40 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP   3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_HOST_REQ_VQ   4 /* Host request virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct virtio_balloon_bmap_hdr {
__le64 bmap[0];
 };
 
+enum virtio_balloon_req_id {
+   /* Get unused page information */
+   BALLOON_GET_UNUSED_PAGES,
+};
+
+enum virtio_balloon_flag {
+   /* Have more data for a request */
+   BALLOON_FLAG_CONT,
+   /* No more data for a request */
+   BALLOON_FLAG_DONE,
+};
+
+struct virtio_balloon_req_hdr {
+   /* Used to distinguish different requests */
+   __le16 cmd;
+   /* Reserved */
+   __le16 reserved[3];
+   /* Request parameter */
+   __le64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct

2016-11-30 Thread Liang Li
Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 include/uapi/linux/virtio_balloon.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 343d7dd..1be4b1f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST0 /* Tell before reclaiming 
pages */
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP   3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
__virtio64 val;
 } __attribute__((packed));
 
+/* Response header structure */
+struct virtio_balloon_resp_hdr {
+   __le64 cmd : 8; /* Distinguish different requests type */
+   __le64 flag: 8; /* Mark status for a specific request type */
+   __le64 id : 16; /* Distinguish requests of a specific type */
+   __le64 data_len: 32; /* Length of the following data, in bytes */
+};
+
+/* Page bitmap header structure */
+struct virtio_balloon_bmap_hdr {
+   struct {
+   __le64 start_pfn : 52; /* start pfn for the bitmap */
+   __le64 page_shift : 6; /* page shift width, in bytes */
+   __le64 bmap_len : 6;  /* bitmap length, in bytes */
+   } head;
+   __le64 bmap[0];
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-11-30 Thread Liang Li
This patch set contains two parts of changes to the virtio-balloon.
 
One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.
 
Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.
 
Changes from v4 to v5:
* Drop the code to get the max_pfn, use another way instead.
* Simplify the API to get the unused page information from mm. 

Changes from v3 to v4:
* Use the new scheme suggested by Dave Hansen to encode the bitmap.
* Add code which is missed in v3 to handle migrate page. 
* Free the memory for bitmap intime once the operation is done.
* Address some of the comments in v3.

Changes from v2 to v3:
* Change the name of 'free page' to 'unused page'.
* Use the scatter & gather bitmap instead of a 1MB page bitmap.
* Fix overwriting the page bitmap after kicking.
* Some of MST's comments for v2.
 
Changes from v1 to v2:
* Abandon the patch for dropping page cache.
* Put some structures to uapi head file.
* Use a new way to determine the page bitmap size.
* Use a unified way to send the free page information with the bitmap
* Address the issues referred in MST's comments

Liang Li (5):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c | 539 
 include/linux/mm.h  |   3 +-
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c |  72 +
 4 files changed, 607 insertions(+), 48 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list

2016-11-30 Thread Liang Li
When doing the inflating/deflating operation, the current virtio-balloon
implementation uses an array to save 256 PFNS, then send these PFNS to
host through virtio and process each PFN one by one. This way is not
efficient when inflating/deflating a large mount of memory because too
many times of the following operations:

1. Virtio data transmission
2. Page allocate/free
3. Address translation(GPA->HVA)
4. madvise

The over head of these operations will consume a lot of CPU cycles and
will take a long time to complete, it may impact the QoS of the guest as
well as the host. The overhead will be reduced a lot if batch processing
is used. E.g. If there are several pages whose address are physical
contiguous in the guest, these bulk pages can be processed in one
operation.

The main idea for the optimization is to reduce the above operations as
much as possible. And it can be achieved by using a bitmap instead of an
PFN array. Comparing with PFN array, for a specific size buffer, bitmap
can present more pages, which is very important for batch processing.

Using bitmap instead of PFN is not very helpful when inflating/deflating
a small mount of pages, in this case, using PFNs is better. But using
bitmap will not impact the QoS of guest or host heavily because the
operation will be completed very soon for a small mount of pages, and we
will use some methods to make sure the efficiency not drop too much.

This patch saves the deflated pages to a list instead of the PFN array,
which will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li 
Signed-off-by: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 181793f..f59cb4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -103,12 +103,6 @@ static u32 page_to_balloon_pfn(struct page *page)
return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-   BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-   return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
struct virtio_balloon *vb = vq->vdev->priv;
@@ -181,18 +175,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
size_t num)
return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+struct list_head *pages)
 {
-   unsigned int i;
-   struct page *page;
+   struct page *page, *next;
 
-   /* Find pfns pointing at start of each page, get pages and free them. */
-   for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-  vb->pfns[i]));
+   list_for_each_entry_safe(page, next, pages, lru) {
if (!virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
adjust_managed_page_count(page, 1);
+   list_del(>lru);
put_page(page); /* balloon reference */
}
 }
@@ -202,6 +194,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = >vb_dev_info;
+   LIST_HEAD(pages);
 
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
@@ -215,6 +208,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
if (!page)
break;
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+   list_add(>lru, );
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
@@ -226,7 +220,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
 */
if (vb->num_pfns != 0)
tell_host(vb, vb->deflate_vq);
-   release_pages_balloon(vb);
+   release_pages_balloon(vb, );
mutex_unlock(>balloon_lock);
return num_freed_pages;
 }
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization