Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-05-04 Thread Daniel P . Berrangé
On Mon, Apr 27, 2020 at 01:14:33PM +0100, Dr. David Alan Gilbert wrote:
> * Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> > The patch adds ability to qemu-file to write the data
> > asynchronously to improve the performance on writing.
> > Before, only synchronous writing was supported.
> > 
> > Enabling of the asyncronous mode is managed by new
> > "enabled_buffered" callback.
> 
> It's a bit invasive isn't it - changes a lot of functions in a lot of
> places!
> The multifd code separated the control headers from the data on separate
> fd's - but that doesn't help your case.
> 
> Is there any chance you could do this by using the existing 'save_page'
> hook (that RDMA uses).
> 
> In the cover letter you mention direct qemu_fflush calls - have we got a
> few too many in some palces that you think we can clean out?

When I first introduced the QIOChannel framework, I hoped that we could
largely eliminate QEMUFile as a concept.  Thus I'm a bit suspicious of
the idea of introducing more functionality to QEMUFile, especially as the
notion of buffering I/O is rather generic. Is there scope for having a
QIOChannelBuffered object for doing buffering. Would that provide better
isolation from the migration code and thus be less invasive/complex to
maintain ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-28 Thread Denis Plotnikov




On 28.04.2020 20:54, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:


On 27.04.2020 15:14, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.

It's a bit invasive isn't it - changes a lot of functions in a lot of
places!

If you mean changing the qemu-file code - yes, it is.

Yeh that's what I worry about; qemu-file is pretty complex as it is.
Especially when it then passes it to the channel code etc


If you mean changing the qemu-file usage in the code - no.
The only place to change is the snapshot code when the buffered mode is
enabled with a callback.
The change is in 03 patch of the series.

That's fine - that's easy.


The multifd code separated the control headers from the data on separate
fd's - but that doesn't help your case.

yes, that doesn't help

Is there any chance you could do this by using the existing 'save_page'
hook (that RDMA uses).

I don't think so. My goal is to improve writing performance of
the internal snapshot to qcow2 image. The snapshot is saved in qcow2 as
continuous stream placed in the end of address space.
To achieve the best writing speed I need a size and base-aligned buffer
containing the vm state (with ram) which looks like that (related to ram):

... | ram page header | ram page | ram page header | ram page | ... and so
on

to store the buffer in qcow2 with a single operation.

'save_page' would allow me not to store 'ram page' in the qemu-file internal
structures,
and write my own ram page storing logic. I think that wouldn't help me a lot
because:
1. I need a page with the ram page header
2. I want to reduce the number of io operations
3. I want to save other parts of vm state as fast as possible

May be I can't see the better way of using 'save page' callback.
Could you suggest anything?

I guess it depends if we care about keeping the format of the snapshot
the same here;  if we were open to changing it, then we could use
the save_page hook to delay the writes, so we'd have a pile of headers
followed by a pile of pages.


I think we have to care about keeping the format. Because many users 
already have internal snapshots
saved in the qcow2 images, if we change the format we can't load 
snapshots from those images
as well as make snapshots non-readable for older qemu-s or we need to 
support two versions of format

which I think is too complicated.




Denis

In the cover letter you mention direct qemu_fflush calls - have we got a
few too many in some palces that you think we can clean out?

I'm not sure that some of them are excessive. To the best of my knowlege,
qemu-file is used for the source-destination communication on migration
and removing some qemu_fflush-es may break communication logic.

I can't see any obvious places where it's called during the ram
migration; can you try and give me a hint to where you're seeing it ?


I think those qemu_fflush-es aren't in the ram migration rather than in 
other vm state parts.
Although, those parts are quite small in comparison to ram, I saw quite 
a lot of qemu_fflush-es while debugging.
Still, we could benefit saving them with fewer number of io operation if 
we going to use buffered mode.


Denis




Snapshot is just a special case (if not the only) when we know that we can
do buffered (cached)
writings. Do you know any other cases when the buffered (cached) mode could
be useful?

The RDMA code does it because it's really not good at small transfers,
but maybe generally it would be a good idea to do larger writes if
possible - something that multifd manages.

Dave


Dave


Signed-off-by: Denis Plotnikov 
---
   include/qemu/typedefs.h |   1 +
   migration/qemu-file.c   | 351 
+---
   migration/qemu-file.h   |   9 ++
   3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
   typedef struct QemuConsole QemuConsole;
   typedef struct QEMUFile QEMUFile;
   typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
   typedef struct QemuLockable QemuLockable;
   typedef struct QemuMutex QemuMutex;
   typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
   #include "qemu-file.h"
   #include "trace.h"
   #include "qapi/error.h"
+#include "block/aio_task.h"
-#define IO_BUF_SIZE 32768
+#define IO_BUF_SIZE (1024 * 1024)
   #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2
+#define 

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-28 Thread Dr. David Alan Gilbert
* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> 
> 
> On 27.04.2020 15:14, Dr. David Alan Gilbert wrote:
> > * Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> > > The patch adds ability to qemu-file to write the data
> > > asynchronously to improve the performance on writing.
> > > Before, only synchronous writing was supported.
> > > 
> > > Enabling of the asyncronous mode is managed by new
> > > "enabled_buffered" callback.
> > It's a bit invasive isn't it - changes a lot of functions in a lot of
> > places!
> 
> If you mean changing the qemu-file code - yes, it is.

Yeh that's what I worry about; qemu-file is pretty complex as it is.
Especially when it then passes it to the channel code etc

> If you mean changing the qemu-file usage in the code - no.
> The only place to change is the snapshot code when the buffered mode is
> enabled with a callback.
> The change is in 03 patch of the series.

That's fine - that's easy.

> > The multifd code separated the control headers from the data on separate
> > fd's - but that doesn't help your case.
> 
> yes, that doesn't help
> > 
> > Is there any chance you could do this by using the existing 'save_page'
> > hook (that RDMA uses).
> 
> I don't think so. My goal is to improve writing performance of
> the internal snapshot to qcow2 image. The snapshot is saved in qcow2 as
> continuous stream placed in the end of address space.
> To achieve the best writing speed I need a size and base-aligned buffer
> containing the vm state (with ram) which looks like that (related to ram):
> 
> ... | ram page header | ram page | ram page header | ram page | ... and so
> on
> 
> to store the buffer in qcow2 with a single operation.
> 
> 'save_page' would allow me not to store 'ram page' in the qemu-file internal
> structures,
> and write my own ram page storing logic. I think that wouldn't help me a lot
> because:
> 1. I need a page with the ram page header
> 2. I want to reduce the number of io operations
> 3. I want to save other parts of vm state as fast as possible
> 
> May be I can't see the better way of using 'save page' callback.
> Could you suggest anything?

I guess it depends if we care about keeping the format of the snapshot
the same here;  if we were open to changing it, then we could use
the save_page hook to delay the writes, so we'd have a pile of headers
followed by a pile of pages.

> Denis
> > In the cover letter you mention direct qemu_fflush calls - have we got a
> > few too many in some palces that you think we can clean out?
> 
> I'm not sure that some of them are excessive. To the best of my knowlege,
> qemu-file is used for the source-destination communication on migration
> and removing some qemu_fflush-es may break communication logic.

I can't see any obvious places where it's called during the ram
migration; can you try and give me a hint to where you're seeing it ?

> Snapshot is just a special case (if not the only) when we know that we can
> do buffered (cached)
> writings. Do you know any other cases when the buffered (cached) mode could
> be useful?

The RDMA code does it because it's really not good at small transfers,
but maybe generally it would be a good idea to do larger writes if
possible - something that multifd manages.

Dave

> 
> > 
> > Dave
> > 
> > > Signed-off-by: Denis Plotnikov 
> > > ---
> > >   include/qemu/typedefs.h |   1 +
> > >   migration/qemu-file.c   | 351 
> > > +---
> > >   migration/qemu-file.h   |   9 ++
> > >   3 files changed, 339 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> > > index 88dce54..9b388c8 100644
> > > --- a/include/qemu/typedefs.h
> > > +++ b/include/qemu/typedefs.h
> > > @@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
> > >   typedef struct QemuConsole QemuConsole;
> > >   typedef struct QEMUFile QEMUFile;
> > >   typedef struct QEMUFileBuffer QEMUFileBuffer;
> > > +typedef struct QEMUFileAioTask QEMUFileAioTask;
> > >   typedef struct QemuLockable QemuLockable;
> > >   typedef struct QemuMutex QemuMutex;
> > >   typedef struct QemuOpt QemuOpt;
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 285c6ef..f42f949 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -29,19 +29,25 @@
> > >   #include "qemu-file.h"
> > >   #include "trace.h"
> > >   #include "qapi/error.h"
> > > +#include "block/aio_task.h"
> > > -#define IO_BUF_SIZE 32768
> > > +#define IO_BUF_SIZE (1024 * 1024)
> > >   #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
> > > +#define IO_BUF_NUM 2
> > > +#define IO_BUF_ALIGNMENT 512
> > > -QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));
> > > +QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
> > > +QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
> > > +QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
> > >   struct QEMUFileBuffer {
> > >   int buf_index;
> > > -int buf_size; /* 0 when writing */
> > > +int 

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-28 Thread Denis Plotnikov




On 27.04.2020 15:14, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.

It's a bit invasive isn't it - changes a lot of functions in a lot of
places!


If you mean changing the qemu-file code - yes, it is.

If you mean changing the qemu-file usage in the code - no.
The only place to change is the snapshot code when the buffered mode is 
enabled with a callback.

The change is in 03 patch of the series.


The multifd code separated the control headers from the data on separate
fd's - but that doesn't help your case.


yes, that doesn't help


Is there any chance you could do this by using the existing 'save_page'
hook (that RDMA uses).


I don't think so. My goal is to improve writing performance of
the internal snapshot to qcow2 image. The snapshot is saved in qcow2 as
continuous stream placed in the end of address space.
To achieve the best writing speed I need a size and base-aligned buffer
containing the vm state (with ram) which looks like that (related to ram):

... | ram page header | ram page | ram page header | ram page | ... and 
so on


to store the buffer in qcow2 with a single operation.

'save_page' would allow me not to store 'ram page' in the qemu-file 
internal structures,
and write my own ram page storing logic. I think that wouldn't help me a 
lot because:

1. I need a page with the ram page header
2. I want to reduce the number of io operations
3. I want to save other parts of vm state as fast as possible

May be I can't see the better way of using 'save page' callback.
Could you suggest anything?

Denis

In the cover letter you mention direct qemu_fflush calls - have we got a
few too many in some palces that you think we can clean out?


I'm not sure that some of them are excessive. To the best of my knowlege,
qemu-file is used for the source-destination communication on migration
and removing some qemu_fflush-es may break communication logic.

Snapshot is just a special case (if not the only) when we know that we 
can do buffered (cached)
writings. Do you know any other cases when the buffered (cached) mode 
could be useful?




Dave


Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
  #include "qemu-file.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "block/aio_task.h"
  
-#define IO_BUF_SIZE 32768

+#define IO_BUF_SIZE (1024 * 1024)
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2
+#define IO_BUF_ALIGNMENT 512
  
-QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));

+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
+QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
+QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
  
  struct QEMUFileBuffer {

  int buf_index;
-int buf_size; /* 0 when writing */
+int buf_size; /* 0 when non-buffered writing */
  uint8_t *buf;
  unsigned long *may_free;
  struct iovec *iov;
  unsigned int iovcnt;
+QLIST_ENTRY(QEMUFileBuffer) link;
  };
  
  struct QEMUFile {

@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;
+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */
+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
  };
  
  /*

@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
  f->opaque = opaque;
  f->ops = ops;
  
-f->current_buf = g_new0(QEMUFileBuffer, 1);

-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-27 Thread Dr. David Alan Gilbert
* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> The patch adds ability to qemu-file to write the data
> asynchronously to improve the performance on writing.
> Before, only synchronous writing was supported.
> 
> Enabling of the asyncronous mode is managed by new
> "enabled_buffered" callback.

It's a bit invasive isn't it - changes a lot of functions in a lot of
places!
The multifd code separated the control headers from the data on separate
fd's - but that doesn't help your case.

Is there any chance you could do this by using the existing 'save_page'
hook (that RDMA uses).

In the cover letter you mention direct qemu_fflush calls - have we got a
few too many in some palces that you think we can clean out?

Dave

> Signed-off-by: Denis Plotnikov 
> ---
>  include/qemu/typedefs.h |   1 +
>  migration/qemu-file.c   | 351 
> +---
>  migration/qemu-file.h   |   9 ++
>  3 files changed, 339 insertions(+), 22 deletions(-)
> 
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 88dce54..9b388c8 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
>  typedef struct QemuConsole QemuConsole;
>  typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUFileBuffer QEMUFileBuffer;
> +typedef struct QEMUFileAioTask QEMUFileAioTask;
>  typedef struct QemuLockable QemuLockable;
>  typedef struct QemuMutex QemuMutex;
>  typedef struct QemuOpt QemuOpt;
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 285c6ef..f42f949 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -29,19 +29,25 @@
>  #include "qemu-file.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "block/aio_task.h"
>  
> -#define IO_BUF_SIZE 32768
> +#define IO_BUF_SIZE (1024 * 1024)
>  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
> +#define IO_BUF_NUM 2
> +#define IO_BUF_ALIGNMENT 512
>  
> -QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));
> +QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
> +QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
> +QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
>  
>  struct QEMUFileBuffer {
>  int buf_index;
> -int buf_size; /* 0 when writing */
> +int buf_size; /* 0 when non-buffered writing */
>  uint8_t *buf;
>  unsigned long *may_free;
>  struct iovec *iov;
>  unsigned int iovcnt;
> +QLIST_ENTRY(QEMUFileBuffer) link;
>  };
>  
>  struct QEMUFile {
> @@ -60,6 +66,22 @@ struct QEMUFile {
>  bool shutdown;
>  /* currently used buffer */
>  QEMUFileBuffer *current_buf;
> +/*
> + * with buffered_mode enabled all the data copied to 512 byte
> + * aligned buffer, including iov data. Then the buffer is passed
> + * to writev_buffer callback.
> + */
> +bool buffered_mode;
> +/* for async buffer writing */
> +AioTaskPool *pool;
> +/* the list of free buffers, currently used on is NOT there */
> +QLIST_HEAD(, QEMUFileBuffer) free_buffers;
> +};
> +
> +struct QEMUFileAioTask {
> +AioTask task;
> +QEMUFile *f;
> +QEMUFileBuffer *fb;
>  };
>  
>  /*
> @@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const 
> QEMUFileOps *ops)
>  f->opaque = opaque;
>  f->ops = ops;
>  
> -f->current_buf = g_new0(QEMUFileBuffer, 1);
> -f->current_buf->buf = g_malloc(IO_BUF_SIZE);
> -f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
> -f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
> +if (f->ops->enable_buffered) {
> +f->buffered_mode = f->ops->enable_buffered(f->opaque);
> +}
> +
> +if (f->buffered_mode && qemu_file_is_writable(f)) {
> +int i;
> +/*
> + * in buffered_mode we don't use internal io vectors
> + * and may_free bitmap, because we copy the data to be
> + * written right away to the buffer
> + */
> +f->pool = aio_task_pool_new(IO_BUF_NUM);
> +
> +/* allocate io buffers */
> +for (i = 0; i < IO_BUF_NUM; i++) {
> +QEMUFileBuffer *fb = g_new0(QEMUFileBuffer, 1);
> +
> +fb->buf = qemu_memalign(IO_BUF_ALIGNMENT, IO_BUF_SIZE);
> +fb->buf_size = IO_BUF_SIZE;
> +
> +/*
> + * put the first buffer to the current buf and the rest
> + * to the list of free buffers
> + */
> +if (i == 0) {
> +f->current_buf = fb;
> +} else {
> +QLIST_INSERT_HEAD(>free_buffers, fb, link);
> +}
> +}
> +} else {
> +f->current_buf = g_new0(QEMUFileBuffer, 1);
> +f->current_buf->buf = g_malloc(IO_BUF_SIZE);
> +f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
> +f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
> +}
>  
>  return f;
>  }
> @@ -190,6 +244,8 @@ static void qemu_iovec_release_ram(QEMUFile *f)
>  unsigned long idx;
>  

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-27 Thread Vladimir Sementsov-Ogievskiy

27.04.2020 11:19, Denis Plotnikov wrote:



On 25.04.2020 12:10, Vladimir Sementsov-Ogievskiy wrote:

13.04.2020 14:12, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.


Hmm.

I don't like resulting architecture very much:

1. Function naming is not clean: how can I understand that copy_buf is for 
buffered mode when add_to_iovec is +- same thing for non-buffered mode?

Yes, this is to be changed in the next patches


Hmm actually, you just alter several significant functions of QEMUFile - open, 
close, put, flush. In old mode we do one thing, in a new mode - absolutely 
another. This looks like a driver. So may be we want to add QEMUFileDriver 
struct, to define these functions as callbacks, move old realizations to 
default driver, and add new functionality as a new driver, what do you think?

Yes it looks like that, but on the other hand I can't imagine another driver to be added 
and changing the code to "the driver notation" would involve more code adding. 
So, should we really do that. Anyway, your suggestion looks cleaner.


2. Terminology: you say you add buffered mode, but actually qemu file is 
already work in buffered mode, so it should be clarified somehow..

The initial implementation uses mixed implementation, it has a buffer and an 
iovec array. The buffer is used to store *some* parts (ecluding RAM) of a vm 
state + service information. Each written to the buffer is added to the iovec 
array as a separate entry. RAM pages aren't added to the buffer, instead they 
are added to the iovec array directly without coping to the buffer. This is why 
we almost always get the iovec array consisting of size- and pointer- unaligned 
iovec-s and why we have the performance issues (more detailed in  of this 
series).
So, I can't say that the qemu-file has "real" buffered-mode.


But if someone will come and try to understand the code, it would be difficult 
because of such naming. If we keep it, we should add your description as 
comment to that name.


You also add asynchronisity, but old implementation has already 
qemu_put_buffer_async..

In my opinion, this function name is kind of confusing. What the function does 
is adding the buffer pointer to the internal iovec array without coping the 
buffer. It's not related to some asynchronous operation.

You use aio task pool, but don't say that it may be used only from coroutine.
May be, we'd better call it "coroutine-mode" ?

I don't think that mentioning any implementation-specific name is a good idea. 
aio_task_pool is a good option to implement async operation, but it can be any 
other interface.
I'd rather implicitly assert qemu_in_coroutine() when in "buffered-mode".



Also, am I correct that new mode can't be used for read? Should we document/assert it somehow? 

It can't just for now since read-buffered mode isn't implemented. Enabling 
buffered-mode for reading affects nothing - qemu-file works as before.
The same qemu-file instance can't  be opened for reading and writing at the 
same time. I'd add that assert on qemu-file open.


Ah, OK. I saw it, but thought somehow that enabled writing doesn't protect from 
reading.


Or may be support reading? Will switch to snapshot benefit if we implement 
reading in a new mode?

I thought about that. I think it could benefit making some kind of read-ahead to a number 
of buffers, while the "current" buffer is used to fill a vm state.
I didn't want to focus on the reading improvements because it's not that big 
problem in comparison to the writing. The qemu-file reading uses a buffer and 
fills  that buffer with a single io operation.


OK







Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
  #include "qemu-file.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "block/aio_task.h"
  -#define IO_BUF_SIZE 32768
+#define IO_BUF_SIZE (1024 * 1024)
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2


Interesting, how much is it better 

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-27 Thread Denis Plotnikov




On 25.04.2020 12:10, Vladimir Sementsov-Ogievskiy wrote:

13.04.2020 14:12, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.


Hmm.

I don't like resulting architecture very much:

1. Function naming is not clean: how can I understand that copy_buf is 
for buffered mode when add_to_iovec is +- same thing for non-buffered 
mode?

Yes, this is to be changed in the next patches


Hmm actually, you just alter several significant functions of QEMUFile 
- open, close, put, flush. In old mode we do one thing, in a new mode 
- absolutely another. This looks like a driver. So may be we want to 
add QEMUFileDriver struct, to define these functions as callbacks, 
move old realizations to default driver, and add new functionality as 
a new driver, what do you think?
Yes it looks like that, but on the other hand I can't imagine another 
driver to be added and changing the code to "the driver notation" would 
involve more code adding. So, should we really do that. Anyway, your 
suggestion looks cleaner.


2. Terminology: you say you add buffered mode, but actually qemu file 
is already work in buffered mode, so it should be clarified somehow..
The initial implementation uses mixed implementation, it has a buffer 
and an iovec array. The buffer is used to store *some* parts (ecluding 
RAM) of a vm state + service information. Each written to the buffer is 
added to the iovec array as a separate entry. RAM pages aren't added to 
the buffer, instead they are added to the iovec array directly without 
coping to the buffer. This is why we almost always get the iovec array 
consisting of size- and pointer- unaligned iovec-s and why we have the 
performance issues (more detailed in  of this series).

So, I can't say that the qemu-file has "real" buffered-mode.
You also add asynchronisity, but old implementation has already 
qemu_put_buffer_async..
In my opinion, this function name is kind of confusing. What the 
function does is adding the buffer pointer to the internal iovec array 
without coping the buffer. It's not related to some asynchronous operation.
You use aio task pool, but don't say that it may be used only from 
coroutine.

May be, we'd better call it "coroutine-mode" ?
I don't think that mentioning any implementation-specific name is a good 
idea. aio_task_pool is a good option to implement async operation, but 
it can be any other interface.

I'd rather implicitly assert qemu_in_coroutine() when in "buffered-mode".



Also, am I correct that new mode can't be used for read? Should we 
document/assert it somehow? 
It can't just for now since read-buffered mode isn't implemented. 
Enabling buffered-mode for reading affects nothing - qemu-file works as 
before.
The same qemu-file instance can't  be opened for reading and writing at 
the same time. I'd add that assert on qemu-file open.
Or may be support reading? Will switch to snapshot benefit if we 
implement reading in a new mode?
I thought about that. I think it could benefit making some kind of 
read-ahead to a number of buffers, while the "current" buffer is used to 
fill a vm state.
I didn't want to focus on the reading improvements because it's not that 
big problem in comparison to the writing. The qemu-file reading uses a 
buffer and fills  that buffer with a single io operation.






Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 
+---

  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
  #include "qemu-file.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "block/aio_task.h"
  -#define IO_BUF_SIZE 32768
+#define IO_BUF_SIZE (1024 * 1024)
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2


Interesting, how much is it better than if we set to 1, limiting the 
influence of the series to alignment of written chunks?

+#define IO_BUF_ALIGNMENT 512
  -QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
+QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-27 Thread Denis Plotnikov




On 25.04.2020 00:25, Eric Blake wrote:

On 4/13/20 6:12 AM, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new


asynchronous


"enabled_buffered" callback.


The term "enabled_buffered" does not appear in the patch.  Did you 
mean...




Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 
+---

  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)




@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+    /*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+    bool buffered_mode;


..."Asynchronous mode is managed by setting the new buffered_mode 
flag"?  ...




+    /* for async buffer writing */
+    AioTaskPool *pool;
+    /* the list of free buffers, currently used on is NOT there */


s/on/one/


+    QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+    AioTask task;
+    QEMUFile *f;
+    QEMUFileBuffer *fb;
  };
    /*
@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const 
QEMUFileOps *ops)

  f->opaque = opaque;
  f->ops = ops;
  -    f->current_buf = g_new0(QEMUFileBuffer, 1);
-    f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-    f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-    f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+    if (f->ops->enable_buffered) {
+    f->buffered_mode = f->ops->enable_buffered(f->opaque);


...ah, you meant 'enable_buffered'.  But still, why do we need a 
callback function?  Is it not sufficient to just have a bool flag?




+static size_t get_buf_free_size(QEMUFile *f)
+{
+    QEMUFileBuffer *fb = f->current_buf;
+    /* buf_index can't be greated than buf_size */


greater


+    assert(fb->buf_size >= fb->buf_index);
+    return fb->buf_size - fb->buf_index;
+}
+



+static int write_task_fn(AioTask *task)
+{



+    /*
+ * Increment file position.
+ * This needs to be here before calling writev_buffer, because
+ * writev_buffer is asynchronous and there could be more than one
+ * writev_buffer started simultaniously. Each writev_buffer should


simultaneously


+ * use its own file pos to write to. writev_buffer may write less
+ * than buf_index bytes but we treat this situation as an error.
+ * If error appeared, further file using is meaningless.


s/using/use/


+ * We expect that, the most of the time the full buffer is written,
+ * (when buf_size == buf_index). The only case when the non-full
+ * buffer is written (buf_size != buf_index) is file close,
+ * when we need to flush the rest of the buffer content.


We expect that most of the time, the full buffer will be written 
(buf_size == buf_index), with the exception at file close where we 
need to flush the final partial buffer.



+ */
+    f->pos += fb->buf_index;
+
+    ret = f->ops->writev_buffer(f->opaque, , 1, pos, _error);
+
+    /* return the just written buffer to the free list */
+    QLIST_INSERT_HEAD(>free_buffers, fb, link);
+
+    /* check that we have written everything */
+    if (ret != fb->buf_index) {
+    qemu_file_set_error_obj(f, ret < 0 ? ret : -EIO, local_error);
+    }
+
+    /*
+ * always return 0 - don't use task error handling, relay on


rely


+ * qemu file error handling
+ */
+    return 0;
+}
+
+static void qemu_file_switch_current_buf(QEMUFile *f)
+{
+    /*
+ * if the list is empty, wait until some task returns a buffer
+ * to the list of free buffers.
+ */
+    if (QLIST_EMPTY(>free_buffers)) {
+    aio_task_pool_wait_slot(f->pool);
+    }
+
+    /*
+ * sanity check that the list isn't empty
+ * if the free list was empty, we waited for a task complition,


completion

+ * and the pompleted task must return a buffer to a list of free 
buffers


completed


+ */
+    assert(!QLIST_EMPTY(>free_buffers));
+
+    /* set the current buffer for using from the free list */
+    f->current_buf = QLIST_FIRST(>free_buffers);
+    reset_buf(f);
+
+    QLIST_REMOVE(f->current_buf, link);
+}
+



    /*
+ * Copy an external buffer to the intenal current buffer.


internal


+ */
+static void copy_buf(QEMUFile *f, const uint8_t *buf, size_t size,
+ bool may_free)
+{



+++ b/migration/qemu-file.h
@@ -103,6 +103,14 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
  typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr,
 Error **errp);
  +/*
+ * Enables or disables the buffered mode
+ * Existing blocking reads/writes must be woken
+ 

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-25 Thread Vladimir Sementsov-Ogievskiy

13.04.2020 14:12, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.


Hmm.

I don't like resulting architecture very much:

1. Function naming is not clean: how can I understand that copy_buf is for 
buffered mode when add_to_iovec is +- same thing for non-buffered mode?

Hmm actually, you just alter several significant functions of QEMUFile - open, 
close, put, flush. In old mode we do one thing, in a new mode - absolutely 
another. This looks like a driver. So may be we want to add QEMUFileDriver 
struct, to define these functions as callbacks, move old realizations to 
default driver, and add new functionality as a new driver, what do you think?

2. Terminology: you say you add buffered mode, but actually qemu file is 
already work in buffered mode, so it should be clarified somehow..
You also add asynchronisity, but old implementation has already 
qemu_put_buffer_async..
You use aio task pool, but don't say that it may be used only from coroutine.
May be, we'd better call it "coroutine-mode" ?


Also, am I correct that new mode can't be used for read? Should we 
document/assert it somehow? Or may be support reading? Will switch to snapshot 
benefit if we implement reading in a new mode?



Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
  #include "qemu-file.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "block/aio_task.h"
  
-#define IO_BUF_SIZE 32768

+#define IO_BUF_SIZE (1024 * 1024)
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2


Interesting, how much is it better than if we set to 1, limiting the influence 
of the series to alignment of written chunks?


+#define IO_BUF_ALIGNMENT 512
  
-QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));

+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
+QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
+QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
  
  struct QEMUFileBuffer {

  int buf_index;
-int buf_size; /* 0 when writing */
+int buf_size; /* 0 when non-buffered writing */
  uint8_t *buf;
  unsigned long *may_free;
  struct iovec *iov;
  unsigned int iovcnt;
+QLIST_ENTRY(QEMUFileBuffer) link;
  };
  
  struct QEMUFile {

@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;
+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */
+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
  };
  
  /*

@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
  f->opaque = opaque;
  f->ops = ops;
  
-f->current_buf = g_new0(QEMUFileBuffer, 1);

-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+if (f->ops->enable_buffered) {
+f->buffered_mode = f->ops->enable_buffered(f->opaque);
+}
+
+if (f->buffered_mode && qemu_file_is_writable(f)) {


So we actually go to buffered_mode if file is writable. Then, shouldn't we 
otherwise set buffered_mode to false otherwise?


+int i;
+/*
+ * in buffered_mode we don't use internal io vectors
+ * and may_free bitmap, because we copy the data to be
+ * written right away to the buffer
+ */
+f->pool = aio_task_pool_new(IO_BUF_NUM);
+
+/* allocate io buffers */
+for (i = 0; i < IO_BUF_NUM; i++) {
+QEMUFileBuffer *fb = g_new0(QEMUFileBuffer, 1);
+
+fb->buf = qemu_memalign(IO_BUF_ALIGNMENT, IO_BUF_SIZE);
+

Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-24 Thread Eric Blake

On 4/13/20 6:12 AM, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new


asynchronous


"enabled_buffered" callback.


The term "enabled_buffered" does not appear in the patch.  Did you mean...



Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)




@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;


..."Asynchronous mode is managed by setting the new buffered_mode flag"? 
 ...




+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */


s/on/one/


+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
  };
  
  /*

@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
  f->opaque = opaque;
  f->ops = ops;
  
-f->current_buf = g_new0(QEMUFileBuffer, 1);

-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+if (f->ops->enable_buffered) {
+f->buffered_mode = f->ops->enable_buffered(f->opaque);


...ah, you meant 'enable_buffered'.  But still, why do we need a 
callback function?  Is it not sufficient to just have a bool flag?




+static size_t get_buf_free_size(QEMUFile *f)
+{
+QEMUFileBuffer *fb = f->current_buf;
+/* buf_index can't be greated than buf_size */


greater


+assert(fb->buf_size >= fb->buf_index);
+return fb->buf_size - fb->buf_index;
+}
+



+static int write_task_fn(AioTask *task)
+{



+/*
+ * Increment file position.
+ * This needs to be here before calling writev_buffer, because
+ * writev_buffer is asynchronous and there could be more than one
+ * writev_buffer started simultaniously. Each writev_buffer should


simultaneously


+ * use its own file pos to write to. writev_buffer may write less
+ * than buf_index bytes but we treat this situation as an error.
+ * If error appeared, further file using is meaningless.


s/using/use/


+ * We expect that, the most of the time the full buffer is written,
+ * (when buf_size == buf_index). The only case when the non-full
+ * buffer is written (buf_size != buf_index) is file close,
+ * when we need to flush the rest of the buffer content.


We expect that most of the time, the full buffer will be written 
(buf_size == buf_index), with the exception at file close where we need 
to flush the final partial buffer.



+ */
+f->pos += fb->buf_index;
+
+ret = f->ops->writev_buffer(f->opaque, , 1, pos, _error);
+
+/* return the just written buffer to the free list */
+QLIST_INSERT_HEAD(>free_buffers, fb, link);
+
+/* check that we have written everything */
+if (ret != fb->buf_index) {
+qemu_file_set_error_obj(f, ret < 0 ? ret : -EIO, local_error);
+}
+
+/*
+ * always return 0 - don't use task error handling, relay on


rely


+ * qemu file error handling
+ */
+return 0;
+}
+
+static void qemu_file_switch_current_buf(QEMUFile *f)
+{
+/*
+ * if the list is empty, wait until some task returns a buffer
+ * to the list of free buffers.
+ */
+if (QLIST_EMPTY(>free_buffers)) {
+aio_task_pool_wait_slot(f->pool);
+}
+
+/*
+ * sanity check that the list isn't empty
+ * if the free list was empty, we waited for a task complition,


completion


+ * and the pompleted task must return a buffer to a list of free buffers


completed


+ */
+assert(!QLIST_EMPTY(>free_buffers));
+
+/* set the current buffer for using from the free list */
+f->current_buf = QLIST_FIRST(>free_buffers);
+reset_buf(f);
+
+QLIST_REMOVE(f->current_buf, link);
+}
+


  
  /*

+ * Copy an external buffer to the intenal current buffer.


internal


+ */
+static void copy_buf(QEMUFile *f, const uint8_t *buf, size_t size,
+ bool may_free)
+{



+++ b/migration/qemu-file.h
@@ -103,6 +103,14 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
  typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr,
 Error **errp);
  
+/*

+ * Enables or disables the buffered mode
+ * Existing blocking reads/writes must be woken
+ * Returns true if the buffered mode has 

[RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-13 Thread Denis Plotnikov
The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.

Signed-off-by: Denis Plotnikov 
---
 include/qemu/typedefs.h |   1 +
 migration/qemu-file.c   | 351 +---
 migration/qemu-file.h   |   9 ++
 3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
 typedef struct QemuConsole QemuConsole;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
 typedef struct QemuLockable QemuLockable;
 typedef struct QemuMutex QemuMutex;
 typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "block/aio_task.h"
 
-#define IO_BUF_SIZE 32768
+#define IO_BUF_SIZE (1024 * 1024)
 #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2
+#define IO_BUF_ALIGNMENT 512
 
-QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, IO_BUF_ALIGNMENT));
+QEMU_BUILD_BUG_ON(IO_BUF_SIZE > INT_MAX);
+QEMU_BUILD_BUG_ON(IO_BUF_NUM <= 0);
 
 struct QEMUFileBuffer {
 int buf_index;
-int buf_size; /* 0 when writing */
+int buf_size; /* 0 when non-buffered writing */
 uint8_t *buf;
 unsigned long *may_free;
 struct iovec *iov;
 unsigned int iovcnt;
+QLIST_ENTRY(QEMUFileBuffer) link;
 };
 
 struct QEMUFile {
@@ -60,6 +66,22 @@ struct QEMUFile {
 bool shutdown;
 /* currently used buffer */
 QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;
+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */
+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
 };
 
 /*
@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
 f->opaque = opaque;
 f->ops = ops;
 
-f->current_buf = g_new0(QEMUFileBuffer, 1);
-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+if (f->ops->enable_buffered) {
+f->buffered_mode = f->ops->enable_buffered(f->opaque);
+}
+
+if (f->buffered_mode && qemu_file_is_writable(f)) {
+int i;
+/*
+ * in buffered_mode we don't use internal io vectors
+ * and may_free bitmap, because we copy the data to be
+ * written right away to the buffer
+ */
+f->pool = aio_task_pool_new(IO_BUF_NUM);
+
+/* allocate io buffers */
+for (i = 0; i < IO_BUF_NUM; i++) {
+QEMUFileBuffer *fb = g_new0(QEMUFileBuffer, 1);
+
+fb->buf = qemu_memalign(IO_BUF_ALIGNMENT, IO_BUF_SIZE);
+fb->buf_size = IO_BUF_SIZE;
+
+/*
+ * put the first buffer to the current buf and the rest
+ * to the list of free buffers
+ */
+if (i == 0) {
+f->current_buf = fb;
+} else {
+QLIST_INSERT_HEAD(>free_buffers, fb, link);
+}
+}
+} else {
+f->current_buf = g_new0(QEMUFileBuffer, 1);
+f->current_buf->buf = g_malloc(IO_BUF_SIZE);
+f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
+f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+}
 
 return f;
 }
@@ -190,6 +244,8 @@ static void qemu_iovec_release_ram(QEMUFile *f)
 unsigned long idx;
 QEMUFileBuffer *fb = f->current_buf;
 
+assert(!f->buffered_mode);
+
 /* Find and release all the contiguous memory ranges marked as may_free. */
 idx = find_next_bit(fb->may_free, fb->iovcnt, 0);
 if (idx >= fb->iovcnt) {
@@ -221,6 +277,147 @@ static void qemu_iovec_release_ram(QEMUFile *f)
 bitmap_zero(fb->may_free, MAX_IOV_SIZE);
 }
 
+static void advance_buf_ptr(QEMUFile *f, size_t size)
+{
+QEMUFileBuffer *fb = f->current_buf;
+/* must not advance to 0 */
+assert(size);
+/* must not overflow buf_index (int) */
+assert(fb->buf_index + size <= INT_MAX);
+/* must not exceed buf_size */
+assert(fb->buf_index + size <= fb->buf_size);
+
+fb->buf_index += size;
+}
+
+static size_t