On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote: > (2015/02/10 17:58), Liu Yuan wrote: > >On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote: > >>(2015/02/10 12:10), Liu Yuan wrote: > >>>On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: > >>>>Previously, qemu block driver of sheepdog used hard-coded VDI object size. > >>>>This patch enables users to handle "block_size_shift" value for > >>>>calculating VDI object size. > >>>> > >>>>When you start qemu, you don't need to specify additional command option. > >>>> > >>>>But when you create the VDI which doesn't have default object size > >>>>with qemu-img command, you specify block_size_shift option. > >>>> > >>>>If you want to create a VDI of 8MB(1 << 23) object size, > >>>>you need to specify following command option. > >>>> > >>>> # qemu-img create -o block_size_shift=23 sheepdog:test1 100M > >>>> > >>>>In addition, when you don't specify qemu-img command option, > >>>>a default value of sheepdog cluster is used for creating VDI. > >>>> > >>>> # qemu-img create sheepdog:test2 100M > >>>> > >>>>Signed-off-by: Teruaki Ishizaki <ishizaki.teru...@lab.ntt.co.jp> > >>>>--- > >>>>V4: > >>>> - Limit a read/write buffer size for creating a preallocated VDI. > >>>> - Replace a parse function for the block_size_shift option. > >>>> - Fix an error message. > >>>> > >>>>V3: > >>>> - Delete the needless operation of buffer. > >>>> - Delete the needless operations of request header. > >>>> for SD_OP_GET_CLUSTER_DEFAULT. > >>>> - Fix coding style problems. > >>>> > >>>>V2: > >>>> - Fix coding style problem (white space). > >>>> - Add members, store_policy and block_size_shift to struct > >>>> SheepdogVdiReq. > >>>> - Initialize request header to use block_size_shift specified by user. > >>>>--- > >>>> block/sheepdog.c | 138 > >>>> ++++++++++++++++++++++++++++++++++++++------- > >>>> include/block/block_int.h | 1 + > >>>> 2 files changed, 119 insertions(+), 20 deletions(-) > >>>> > >>>>diff --git a/block/sheepdog.c b/block/sheepdog.c > >>>>index be3176f..a43b947 100644 > >>>>--- a/block/sheepdog.c > >>>>+++ b/block/sheepdog.c > >>>>@@ -37,6 +37,7 @@ > >>>> #define SD_OP_READ_VDIS 0x15 > >>>> #define SD_OP_FLUSH_VDI 0x16 > >>>> #define SD_OP_DEL_VDI 0x17 > >>>>+#define SD_OP_GET_CLUSTER_DEFAULT 0x18 > >>>> > >>>> #define SD_FLAG_CMD_WRITE 0x01 > >>>> #define SD_FLAG_CMD_COW 0x02 > >>>>@@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq { > >>>> uint32_t base_vdi_id; > >>>> uint8_t copies; > >>>> uint8_t copy_policy; > >>>>- uint8_t reserved[2]; > >>>>+ uint8_t store_policy; > >>>>+ uint8_t block_size_shift; > >>>> uint32_t snapid; > >>>> uint32_t type; > >>>> uint32_t pad[2]; > >>>>@@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp { > >>>> uint32_t pad[5]; > >>>> } SheepdogVdiRsp; > >>>> > >>>>+typedef struct SheepdogClusterRsp { > >>>>+ uint8_t proto_ver; > >>>>+ uint8_t opcode; > >>>>+ uint16_t flags; > >>>>+ uint32_t epoch; > >>>>+ uint32_t id; > >>>>+ uint32_t data_length; > >>>>+ uint32_t result; > >>>>+ uint8_t nr_copies; > >>>>+ uint8_t copy_policy; > >>>>+ uint8_t block_size_shift; > >>>>+ uint8_t __pad1; > >>>>+ uint32_t __pad2[6]; > >>>>+} SheepdogClusterRsp; > >>>>+ > >>>> typedef struct SheepdogInode { > >>>> char name[SD_MAX_VDI_LEN]; > >>>> char tag[SD_MAX_VDI_TAG_LEN]; > >>>>@@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, > >>>>uint32_t *vdi_id, int snapshot, > >>>> hdr.vdi_size = s->inode.vdi_size; > >>>> hdr.copy_policy = s->inode.copy_policy; > >>>> hdr.copies = s->inode.nr_copies; > >>>>+ hdr.block_size_shift = s->inode.block_size_shift; > >>>> > >>>> ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, > >>>> &rlen); > >>>> > >>>>@@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, > >>>>uint32_t *vdi_id, int snapshot, > >>>> static int sd_prealloc(const char *filename, Error **errp) > >>>> { > >>>> BlockDriverState *bs = NULL; > >>>>+ BDRVSheepdogState *base = NULL; > >>>>+ unsigned long buf_size; > >>>> uint32_t idx, max_idx; > >>>>+ uint32_t object_size; > >>>> int64_t vdi_size; > >>>>- void *buf = g_malloc0(SD_DATA_OBJ_SIZE); > >>>>+ void *buf = NULL; > >>>> int ret; > >>>> > >>>> ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | > >>>> BDRV_O_PROTOCOL, > >>>>@@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, > >>>>Error **errp) > >>>> ret = vdi_size; > >>>> goto out; > >>>> } > >>>>- max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE); > >>>>+ > >>>>+ base = bs->opaque; > >>>>+ object_size = (UINT32_C(1) << base->inode.block_size_shift); > >>>>+ buf_size = MIN(object_size, SD_DATA_OBJ_SIZE); > >>>>+ buf = g_malloc0(buf_size); > >>>>+ > >>>>+ max_idx = DIV_ROUND_UP(vdi_size, buf_size); > >>>> > >>>> for (idx = 0; idx < max_idx; idx++) { > >>>> /* > >>>> * The created image can be a cloned image, so we need to read > >>>> * a data from the source image. > >>>> */ > >>>>- ret = bdrv_pread(bs, idx * SD_DATA_OBJ_SIZE, buf, > >>>>SD_DATA_OBJ_SIZE); > >>>>+ ret = bdrv_pread(bs, idx * buf_size, buf, buf_size); > >>>> if (ret < 0) { > >>>> goto out; > >>>> } > >>>>- ret = bdrv_pwrite(bs, idx * SD_DATA_OBJ_SIZE, buf, > >>>>SD_DATA_OBJ_SIZE); > >>>>+ ret = bdrv_pwrite(bs, idx * buf_size, buf, buf_size); > >>>> if (ret < 0) { > >>>> goto out; > >>>> } > >>>>@@ -1669,6 +1696,21 @@ static int parse_redundancy(BDRVSheepdogState *s, > >>>>const char *opt) > >>>> return 0; > >>>> } > >>>> > >>>>+static int parse_block_size_shift(BDRVSheepdogState *s, QemuOpts *opt) > >>>>+{ > >>>>+ struct SheepdogInode *inode = &s->inode; > >>>>+ inode->block_size_shift = > >>>>+ (uint8_t)qemu_opt_get_number_del(opt, "block_size_shift", 0); > >>>>+ if (inode->block_size_shift == 0) { > >>>>+ /* block_size_shift is set for cluster default value by sheepdog > >>>>*/ > >>>>+ return 0; > >>>>+ } else if (inode->block_size_shift < 20 || inode->block_size_shift > > >>>>31) { > >>>>+ return -EINVAL; > >>>>+ } > >>>>+ > >>>>+ return 0; > >>>>+} > >>>>+ > >>>> static int sd_create(const char *filename, QemuOpts *opts, > >>>> Error **errp) > >>>> { > >>>>@@ -1679,6 +1721,7 @@ static int sd_create(const char *filename, QemuOpts > >>>>*opts, > >>>> BDRVSheepdogState *s; > >>>> char tag[SD_MAX_VDI_TAG_LEN]; > >>>> uint32_t snapid; > >>>>+ uint64_t max_vdi_size; > >>>> bool prealloc = false; > >>>> > >>>> s = g_new0(BDRVSheepdogState, 1); > >>>>@@ -1718,9 +1761,10 @@ static int sd_create(const char *filename, > >>>>QemuOpts *opts, > >>>> } > >>>> } > >>>> > >>>>- if (s->inode.vdi_size > SD_MAX_VDI_SIZE) { > >>>>- error_setg(errp, "too big image size"); > >>>>- ret = -EINVAL; > >>>>+ ret = parse_block_size_shift(s, opts); > >>>>+ if (ret < 0) { > >>>>+ error_setg(errp, "Invalid block_size_shift:" > >>>>+ " '%s'. please use 20-31", buf); > >>>> goto out; > >>>> } > >>>> > >>>>@@ -1757,6 +1801,47 @@ static int sd_create(const char *filename, > >>>>QemuOpts *opts, > >>>> } > >>>> > >>>> s->aio_context = qemu_get_aio_context(); > >>>>+ > >>>>+ /* if block_size_shift is not specified, get cluster default value */ > >>>>+ if (s->inode.block_size_shift == 0) { > >>> > >>>Seems that we don't keep backward compatibility for new QEMU and old sheep > >>>that > >>>doesn't support SD_OP_GET_CLUSTER_DEFAULT. > >>Then, I'll add the operation that if block_size_shift from > >>SD_OP_GET_CLUSTER_DEFAULT is zero, block_size_shift is set to 22. > >>Is it OK? > > > >If sheep doesn't support SD_OP_GET_CLUSTER_DEFAULT, sheep will return > >SD_RES_INVALID_PARMS. So to keep backward compatibility, we shouldn't issue > >SD_OP_GET_CLUSTER_DEFAULT to sheep daemon. > OK, I think that the way is better. > > > >My point is that, after user upgrading the sheep and still stick with old > >QEMU, > >(I guess many users will), any operations in the past sholudn't fail right > >after > >upgrading. > > > >> > >>> > >>>What will happen if old QEMU with new sheep that support block_size_shift? > >>>Most > >>>distributions will ship the old stable qemu that wouldn't be aware of > >>>block_size_shift. > >>If old QEMU with new sheep is used, VDI is created with cluster default > >>block_size_shift and accessed as 4MB object_size. > >>So I think that for backward compatibility, users must do cluster > >>format command with default block_size_shift 22 equal to > >>4MB object_size. > > > >how old QEMU know about block_size_shift? For old QEMU, block_size_shift is > >encoded as 0 and then send the create request to sheep. Does sheep can handle > >block_size_shift = 0? You know, we can't pass any value to old QEMU for > >block_size_shift. > Sheep can handle block_size_shift = 0, when users create new VDI. > Old QEMU does do_sd_create() without setting hdr.block_size_shift and > sends a request SD_OP_NEW_VDI. > If block_size_shift is set to zero, new sheep sets cluster default value > in cluster_new_vdi() like copies and copy_policy.
Okay, so new sheep can handle old QEMU request. By the way, how about the suggestion in my last email? (x + 1) * 4MB stuff... Thanks Yuan