Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-19 Thread Stefan Hajnoczi
On Thu, May 19, 2022 at 06:02:50PM +0800, Yongji Xie wrote:
> On Thu, May 19, 2022 at 5:44 PM Stefan Hajnoczi  wrote:
> >
> > On Thu, May 19, 2022 at 04:25:13PM +0800, Yongji Xie wrote:
> > > On Wed, May 18, 2022 at 10:03 PM Stefan Hajnoczi  
> > > wrote:
> > > >
> > > > On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> > > > > @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport 
> > > > > *exp, BlockExportOptions *opts,
> > > > >  return -ENOMEM;
> > > > >  }
> > > > >
> > > > > +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> > > > > +   g_get_tmp_dir(), exp->id);
> > > >
> > > > g_get_tmp_dir() returns the $TMPDIR environment variable. This means
> > > > exp->id must be unique across the host. Please document this.
> > > >
> > >
> > > Now we also use exp->id as the name of vduse device which should also
> > > be unique across the host. So I'm not sure if it's better to add a new
> > > unique id for vduse since the exp->id is now used by all block
> > > exports.
> >
> > Good point, I forgot that the VDUSE device name must be unique! It's a
> > little more flexible to have a separate vduse-name= option which is used
> > for the VDUSE device name and the reconnection filename, but also a
> > little more typing. I'm not sure if it's necessary to separate the two
> > concepts.
> >
> 
> OK, let's reuse the exp->id and document this first. We can add this
> new option if we need it in the future.

Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-19 Thread Yongji Xie
On Thu, May 19, 2022 at 5:44 PM Stefan Hajnoczi  wrote:
>
> On Thu, May 19, 2022 at 04:25:13PM +0800, Yongji Xie wrote:
> > On Wed, May 18, 2022 at 10:03 PM Stefan Hajnoczi  
> > wrote:
> > >
> > > On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> > > > @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> > > > BlockExportOptions *opts,
> > > >  return -ENOMEM;
> > > >  }
> > > >
> > > > +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> > > > +   g_get_tmp_dir(), exp->id);
> > >
> > > g_get_tmp_dir() returns the $TMPDIR environment variable. This means
> > > exp->id must be unique across the host. Please document this.
> > >
> >
> > Now we also use exp->id as the name of vduse device which should also
> > be unique across the host. So I'm not sure if it's better to add a new
> > unique id for vduse since the exp->id is now used by all block
> > exports.
>
> Good point, I forgot that the VDUSE device name must be unique! It's a
> little more flexible to have a separate vduse-name= option which is used
> for the VDUSE device name and the reconnection filename, but also a
> little more typing. I'm not sure if it's necessary to separate the two
> concepts.
>

OK, let's reuse the exp->id and document this first. We can add this
new option if we need it in the future.

Thanks,
Yongji



Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-19 Thread Stefan Hajnoczi
On Thu, May 19, 2022 at 04:25:13PM +0800, Yongji Xie wrote:
> On Wed, May 18, 2022 at 10:03 PM Stefan Hajnoczi  wrote:
> >
> > On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> > > @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> > > BlockExportOptions *opts,
> > >  return -ENOMEM;
> > >  }
> > >
> > > +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> > > +   g_get_tmp_dir(), exp->id);
> >
> > g_get_tmp_dir() returns the $TMPDIR environment variable. This means
> > exp->id must be unique across the host. Please document this.
> >
> 
> Now we also use exp->id as the name of vduse device which should also
> be unique across the host. So I'm not sure if it's better to add a new
> unique id for vduse since the exp->id is now used by all block
> exports.

Good point, I forgot that the VDUSE device name must be unique! It's a
little more flexible to have a separate vduse-name= option which is used
for the VDUSE device name and the reconnection filename, but also a
little more typing. I'm not sure if it's necessary to separate the two
concepts.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-19 Thread Yongji Xie
On Wed, May 18, 2022 at 10:03 PM Stefan Hajnoczi  wrote:
>
> On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> > @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> > BlockExportOptions *opts,
> >  return -ENOMEM;
> >  }
> >
> > +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> > +   g_get_tmp_dir(), exp->id);
>
> g_get_tmp_dir() returns the $TMPDIR environment variable. This means
> exp->id must be unique across the host. Please document this.
>

Now we also use exp->id as the name of vduse device which should also
be unique across the host. So I'm not sure if it's better to add a new
unique id for vduse since the exp->id is now used by all block
exports.

Thanks,
Yongji



Re: [PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-18 Thread Stefan Hajnoczi
On Wed, May 04, 2022 at 03:40:51PM +0800, Xie Yongji wrote:
> @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
> BlockExportOptions *opts,
>  return -ENOMEM;
>  }
>  
> +vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
> +   g_get_tmp_dir(), exp->id);

g_get_tmp_dir() returns the $TMPDIR environment variable. This means
exp->id must be unique across the host. Please document this.

Thanks,
Stefan


signature.asc
Description: PGP signature


[PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-04 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 235 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 2b72baf7ab..1fd787a862 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -33,6 +33,7 @@ typedef struct VduseBlkExport {
 VduseDev *dev;
 uint16_t num_queues;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -111,6 +112,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -314,6 +326,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index ecee9c0568..b27145ceed 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, );
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+return -1;
+}
+
+static int vduse_queue_check_inflights(VduseVirtq *vq)
+{
+int i = 0;
+VduseDev *dev =