Re: [RFC PATCH v2 3/3] block: add sheepdog driver for distributed storage support
Hi, Thank you very much for the reviewing! At Fri, 14 May 2010 13:08:06 +0200, Kevin Wolf wrote: + +struct sd_req { + uint8_t proto_ver; + uint8_t opcode; + uint16_tflags; + uint32_tepoch; + uint32_tid; + uint32_tdata_length; + uint32_topcode_specific[8]; +}; CODING_STYLE says that structs should be typedefed and their names should be in CamelCase. So something like this: typedef struct SheepdogReq { ... } SheepdogReq; (Or, if your prefer, SDReq; but with things like SDAIOCB I think it becomes hard to read) I see. I use Sheepdog as a prefix, like SheepdogReq. +/* + +#undef eprintf +#define eprintf(fmt, args...) \ +do { \ + fprintf(stderr, %s %d: fmt, __func__, __LINE__, ##args); \ +} while (0) What about using error_report() instead of fprintf? Though it should be the same currently. Yes, using common helper functions is better. I replaced all the printf. + + for (i = 0; i ARRAY_SIZE(errors); ++i) + if (errors[i].err == err) + return errors[i].desc; CODING_STYLE requires braces here. I fixed all the missing braces. + + return Invalid error code; +} + +static inline int before(uint32_t seq1, uint32_t seq2) +{ +return (int32_t)(seq1 - seq2) 0; +} + +static inline int after(uint32_t seq1, uint32_t seq2) +{ + return (int32_t)(seq2 - seq1) 0; +} These functions look strange... Is the difference to seq1 seq2 that the cast introduces intentional? (after(0x0, 0xabcdefff) == 1) If yes, why is this useful? This needs a comment. If no, why even bother to have this function instead of directly using or ? These functions are used to compare sequential numbers which can be wrap-around. For example, linux/net/tcp.h in the linux kernel. Anyway, sheepdog doesn't use these functions, so I removed them. + if (snapid) + dprintf(% PRIx32 non current inode was open.\n, vid); + else + s-is_current = 1; + + fd = connect_to_sdog(s-addr); I wonder why you need to open another connection here instead of using s-fd. This pattern repeats at least in the snapshot functions, so I'm sure it's there for a reason. Maybe add a comment? We can use s-fd only for aio read/write operations. It is because the block driver may be during waiting response from the server, so we cannot send other requests to the discriptor to avoid receiving wrong data. I added the comment to get_sheep_fd(). + + iov.iov_base = s-inode; + iov.iov_len = sizeof(s-inode); + aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id), + data_len, offset, 0, 0, offset); + if (!aio_req) { + eprintf(too many requests\n); + acb-ret = -EIO; + goto out; + } Randomly failing requests is probably not a good idea. The guest might decide that the disk/file system is broken and stop using it. Can't you use a list like in AIOPool, so you can dynamically add new requests as needed? I agree. In the v3 patch, AIO requests are allocated dynamically, and all the requests are linked to the outstanding_aio_head in the BDRVSheepdogState. + +static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) +{ + struct bdrv_sd_state *s = bs-opaque; + struct bdrv_sd_state *old_s; + char vdi[SD_MAX_VDI_LEN]; + char *buf = NULL; + uint32_t vid; + uint32_t snapid = 0; + int ret = -ENOENT, fd; + + old_s = qemu_malloc(sizeof(struct bdrv_sd_state)); + if (!old_s) { qemu_malloc never returns NULL. I removed all the NULL checks. + +BlockDriver bdrv_sheepdog = { + .format_name = sheepdog, + .protocol_name = sheepdog, + .instance_size = sizeof(struct bdrv_sd_state), + .bdrv_file_open = sd_open, + .bdrv_close = sd_close, + .bdrv_create = sd_create, + + .bdrv_aio_readv = sd_aio_readv, + .bdrv_aio_writev = sd_aio_writev, + + .bdrv_snapshot_create = sd_snapshot_create, + .bdrv_snapshot_goto = sd_snapshot_goto, + .bdrv_snapshot_delete = sd_snapshot_delete, + .bdrv_snapshot_list = sd_snapshot_list, + + .bdrv_save_vmstate = sd_save_vmstate, + .bdrv_load_vmstate = sd_load_vmstate, + + .create_options = sd_create_options, +}; Please align the = to the same column, at least in each block. I have aligned in the v3 patch. Thanks, Kazutaka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/3] block: add sheepdog driver for distributed storage support
Am 14.05.2010 11:51, schrieb MORITA Kazutaka: Sheepdog is a distributed storage system for QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. This patch adds a qemu block driver for Sheepdog. Sheepdog features are: - No node in the cluster is special (no metadata node, no control node, etc) - Linear scalability in performance and capacity - No single point of failure - Autonomous management (zero configuration) - Useful volume management support such as snapshot and cloning - Thin provisioning - Autonomous load balancing The more details are available at the project site: http://www.osrg.net/sheepdog/ Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Once we solved the image creation thing, I think I'm going to be happy with the block interface. Of course, this is something that doesn't even directly affect the driver code, just the way it is used. I have no clue about the Sheepdog protocol, so I'm just trying to comment on some general details. --- Makefile.objs|2 +- block/sheepdog.c | 1831 ++ 2 files changed, 1832 insertions(+), 1 deletions(-) create mode 100644 block/sheepdog.c diff --git a/Makefile.objs b/Makefile.objs index ecdd53e..6edbc57 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o -block-nested-y += parallels.o nbd.o blkdebug.o +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/sheepdog.c b/block/sheepdog.c new file mode 100644 index 000..adf3a71 --- /dev/null +++ b/block/sheepdog.c @@ -0,0 +1,1831 @@ +/* + * Copyright (C) 2009-2010 Nippon Telegraph and Telephone Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ +#include netdb.h +#include netinet/tcp.h + +#include qemu-common.h +#include block_int.h + +#define SD_PROTO_VER 0x01 + +#define SD_DEFAULT_ADDR localhost:7000 + +#define SD_OP_CREATE_AND_WRITE_OBJ 0x01 +#define SD_OP_READ_OBJ 0x02 +#define SD_OP_WRITE_OBJ 0x03 + +#define SD_OP_NEW_VDI0x11 +#define SD_OP_LOCK_VDI 0x12 +#define SD_OP_RELEASE_VDI0x13 +#define SD_OP_GET_VDI_INFO 0x14 +#define SD_OP_READ_VDIS 0x15 + +#define SD_FLAG_CMD_WRITE0x01 +#define SD_FLAG_CMD_COW 0x02 + +#define SD_RES_SUCCESS 0x00 /* Success */ +#define SD_RES_UNKNOWN 0x01 /* Unknown error */ +#define SD_RES_NO_OBJ0x02 /* No object found */ +#define SD_RES_EIO 0x03 /* I/O error */ +#define SD_RES_VDI_EXIST 0x04 /* Vdi exists already */ +#define SD_RES_INVALID_PARMS 0x05 /* Invalid parameters */ +#define SD_RES_SYSTEM_ERROR 0x06 /* System error */ +#define SD_RES_VDI_LOCKED0x07 /* Vdi is locked */ +#define SD_RES_NO_VDI0x08 /* No vdi found */ +#define SD_RES_NO_BASE_VDI 0x09 /* No base vdi found */ +#define SD_RES_VDI_READ 0x0A /* Cannot read requested vdi */ +#define SD_RES_VDI_WRITE 0x0B /* Cannot write requested vdi */ +#define SD_RES_BASE_VDI_READ 0x0C /* Cannot read base vdi */ +#define SD_RES_BASE_VDI_WRITE 0x0D /* Cannot write base vdi */ +#define SD_RES_NO_TAG0x0E /* Requested tag is not found */ +#define SD_RES_STARTUP 0x0F /* Sheepdog is on starting up */ +#define SD_RES_VDI_NOT_LOCKED 0x10 /* Vdi is not locked */ +#define SD_RES_SHUTDOWN 0x11 /* Sheepdog is shutting down */ +#define SD_RES_NO_MEM0x12 /* Cannot allocate memory */ +#define SD_RES_FULL_VDI 0x13 /* we already have the maximum vdis */ +#define SD_RES_VER_MISMATCH 0x14 /* Protocol version mismatch */ +#define SD_RES_NO_SPACE 0x15 /* Server has no room for new objects */ +#define SD_RES_WAIT_FOR_FORMAT 0x16 /* Sheepdog is waiting for a format operation */ +#define SD_RES_WAIT_FOR_JOIN0x17 /* Sheepdog is waiting for other nodes joining */ +#define SD_RES_JOIN_FAILED 0x18 /* Target node had failed to join sheepdog */ + +/* + * Object ID rules + * + * 0 - 19 (20 bits): data object space + * 20 - 31 (12 bits): reserved data object space + * 32 - 55 (24 bits): vdi object space + * 56 - 59 ( 4 bits): reserved vdi object space + * 60 - 63 ( 4 bits): object type indentifier space + */ + +#define VDI_SPACE_SHIFT 32 +#define VDI_BIT (UINT64_C(1) 63) +#define VMSTATE_BIT