Re: [RFC PATCH v2 3/3] block: add sheepdog driver for distributed storage support

2010-05-17 Thread MORITA Kazutaka
Hi,

Thank you very much for the reviewing!

At Fri, 14 May 2010 13:08:06 +0200,
Kevin Wolf wrote:

  +
  +struct sd_req {
  +   uint8_t proto_ver;
  +   uint8_t opcode;
  +   uint16_tflags;
  +   uint32_tepoch;
  +   uint32_tid;
  +   uint32_tdata_length;
  +   uint32_topcode_specific[8];
  +};
 
 CODING_STYLE says that structs should be typedefed and their names
 should be in CamelCase. So something like this:
 
 typedef struct SheepdogReq {
 ...
 } SheepdogReq;
 
 (Or, if your prefer, SDReq; but with things like SDAIOCB I think it
 becomes hard to read)
 

I see. I use Sheepdog as a prefix, like SheepdogReq.


  +/*
  +
  +#undef eprintf
  +#define eprintf(fmt, args...)  
  \
  +do {   
  \
  +   fprintf(stderr, %s %d:  fmt, __func__, __LINE__, ##args); \
  +} while (0)
 
 What about using error_report() instead of fprintf? Though it should be
 the same currently.
 

Yes, using common helper functions is better.  I replaced all the
printf.


  +
  +   for (i = 0; i  ARRAY_SIZE(errors); ++i)
  +   if (errors[i].err == err)
  +   return errors[i].desc;
 
 CODING_STYLE requires braces here.
 

I fixed all the missing braces.


  +
  +   return Invalid error code;
  +}
  +
  +static inline int before(uint32_t seq1, uint32_t seq2)
  +{
  +return (int32_t)(seq1 - seq2)  0;
  +}
  +
  +static inline int after(uint32_t seq1, uint32_t seq2)
  +{
  +   return (int32_t)(seq2 - seq1)  0;
  +}
 
 These functions look strange... Is the difference to seq1  seq2 that
 the cast introduces intentional? (after(0x0, 0xabcdefff) == 1)
 
 If yes, why is this useful? This needs a comment. If no, why even bother
 to have this function instead of directly using  or  ?
 

These functions are used to compare sequential numbers which can be
wrap-around. For example, linux/net/tcp.h in the linux kernel.

Anyway, sheepdog doesn't use these functions, so I removed them.


  +   if (snapid)
  +   dprintf(% PRIx32  non current inode was open.\n, vid);
  +   else
  +   s-is_current = 1;
  +
  +   fd = connect_to_sdog(s-addr);
 
 I wonder why you need to open another connection here instead of using
 s-fd. This pattern repeats at least in the snapshot functions, so I'm
 sure it's there for a reason. Maybe add a comment?
 

We can use s-fd only for aio read/write operations.  It is because
the block driver may be during waiting response from the server, so we
cannot send other requests to the discriptor to avoid receiving wrong
data.

I added the comment to get_sheep_fd().


  +
  +   iov.iov_base = s-inode;
  +   iov.iov_len = sizeof(s-inode);
  +   aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
  +   data_len, offset, 0, 0, offset);
  +   if (!aio_req) {
  +   eprintf(too many requests\n);
  +   acb-ret = -EIO;
  +   goto out;
  +   }
 
 Randomly failing requests is probably not a good idea. The guest might
 decide that the disk/file system is broken and stop using it. Can't you
 use a list like in AIOPool, so you can dynamically add new requests as
 needed?
 

I agree.  In the v3 patch, AIO requests are allocated dynamically, and
all the requests are linked to the outstanding_aio_head in the
BDRVSheepdogState.


  +
  +static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
  +{
  +   struct bdrv_sd_state *s = bs-opaque;
  +   struct bdrv_sd_state *old_s;
  +   char vdi[SD_MAX_VDI_LEN];
  +   char *buf = NULL;
  +   uint32_t vid;
  +   uint32_t snapid = 0;
  +   int ret = -ENOENT, fd;
  +
  +   old_s = qemu_malloc(sizeof(struct bdrv_sd_state));
  +   if (!old_s) {
 
 qemu_malloc never returns NULL.
 

I removed all the NULL checks.


  +
  +BlockDriver bdrv_sheepdog = {
  +   .format_name = sheepdog,
  +   .protocol_name = sheepdog,
  +   .instance_size = sizeof(struct bdrv_sd_state),
  +   .bdrv_file_open = sd_open,
  +   .bdrv_close = sd_close,
  +   .bdrv_create = sd_create,
  +
  +   .bdrv_aio_readv = sd_aio_readv,
  +   .bdrv_aio_writev = sd_aio_writev,
  +
  +   .bdrv_snapshot_create = sd_snapshot_create,
  +   .bdrv_snapshot_goto = sd_snapshot_goto,
  +   .bdrv_snapshot_delete = sd_snapshot_delete,
  +   .bdrv_snapshot_list = sd_snapshot_list,
  +
  +   .bdrv_save_vmstate = sd_save_vmstate,
  +   .bdrv_load_vmstate = sd_load_vmstate,
  +
  +   .create_options = sd_create_options,
  +};
 
 Please align the = to the same column, at least in each block.
 

I have aligned in the v3 patch.


Thanks,

Kazutaka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 3/3] block: add sheepdog driver for distributed storage support

2010-05-14 Thread Kevin Wolf
Am 14.05.2010 11:51, schrieb MORITA Kazutaka:
 Sheepdog is a distributed storage system for QEMU. It provides highly
 available block level storage volumes to VMs like Amazon EBS.  This
 patch adds a qemu block driver for Sheepdog.
 
 Sheepdog features are:
 - No node in the cluster is special (no metadata node, no control
   node, etc)
 - Linear scalability in performance and capacity
 - No single point of failure
 - Autonomous management (zero configuration)
 - Useful volume management support such as snapshot and cloning
 - Thin provisioning
 - Autonomous load balancing
 
 The more details are available at the project site:
 http://www.osrg.net/sheepdog/
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

Once we solved the image creation thing, I think I'm going to be happy
with the block interface. Of course, this is something that doesn't even
directly affect the driver code, just the way it is used.

I have no clue about the Sheepdog protocol, so I'm just trying to
comment on some general details.

 ---
  Makefile.objs|2 +-
  block/sheepdog.c | 1831 
 ++
  2 files changed, 1832 insertions(+), 1 deletions(-)
  create mode 100644 block/sheepdog.c
 
 diff --git a/Makefile.objs b/Makefile.objs
 index ecdd53e..6edbc57 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
  
  block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o 
 vpc.o vvfat.o
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 -block-nested-y += parallels.o nbd.o blkdebug.o
 +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 new file mode 100644
 index 000..adf3a71
 --- /dev/null
 +++ b/block/sheepdog.c
 @@ -0,0 +1,1831 @@
 +/*
 + * Copyright (C) 2009-2010 Nippon Telegraph and Telephone Corporation.
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License version
 + * 2 as published by the Free Software Foundation.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program. If not, see http://www.gnu.org/licenses/.
 + */
 +#include netdb.h
 +#include netinet/tcp.h
 +
 +#include qemu-common.h
 +#include block_int.h
 +
 +#define SD_PROTO_VER 0x01
 +
 +#define SD_DEFAULT_ADDR localhost:7000
 +
 +#define SD_OP_CREATE_AND_WRITE_OBJ  0x01
 +#define SD_OP_READ_OBJ   0x02
 +#define SD_OP_WRITE_OBJ  0x03
 +
 +#define SD_OP_NEW_VDI0x11
 +#define SD_OP_LOCK_VDI   0x12
 +#define SD_OP_RELEASE_VDI0x13
 +#define SD_OP_GET_VDI_INFO   0x14
 +#define SD_OP_READ_VDIS  0x15
 +
 +#define SD_FLAG_CMD_WRITE0x01
 +#define SD_FLAG_CMD_COW  0x02
 +
 +#define SD_RES_SUCCESS   0x00 /* Success */
 +#define SD_RES_UNKNOWN   0x01 /* Unknown error */
 +#define SD_RES_NO_OBJ0x02 /* No object found */
 +#define SD_RES_EIO   0x03 /* I/O error */
 +#define SD_RES_VDI_EXIST 0x04 /* Vdi exists already */
 +#define SD_RES_INVALID_PARMS 0x05 /* Invalid parameters */
 +#define SD_RES_SYSTEM_ERROR  0x06 /* System error */
 +#define SD_RES_VDI_LOCKED0x07 /* Vdi is locked */
 +#define SD_RES_NO_VDI0x08 /* No vdi found */
 +#define SD_RES_NO_BASE_VDI   0x09 /* No base vdi found */
 +#define SD_RES_VDI_READ  0x0A /* Cannot read requested vdi */
 +#define SD_RES_VDI_WRITE 0x0B /* Cannot write requested vdi */
 +#define SD_RES_BASE_VDI_READ 0x0C /* Cannot read base vdi */
 +#define SD_RES_BASE_VDI_WRITE   0x0D /* Cannot write base vdi */
 +#define SD_RES_NO_TAG0x0E /* Requested tag is not found */
 +#define SD_RES_STARTUP   0x0F /* Sheepdog is on starting up */
 +#define SD_RES_VDI_NOT_LOCKED   0x10 /* Vdi is not locked */
 +#define SD_RES_SHUTDOWN  0x11 /* Sheepdog is shutting down */
 +#define SD_RES_NO_MEM0x12 /* Cannot allocate memory */
 +#define SD_RES_FULL_VDI  0x13 /* we already have the maximum vdis */
 +#define SD_RES_VER_MISMATCH  0x14 /* Protocol version mismatch */
 +#define SD_RES_NO_SPACE  0x15 /* Server has no room for new objects */
 +#define SD_RES_WAIT_FOR_FORMAT  0x16 /* Sheepdog is waiting for a format 
 operation */
 +#define SD_RES_WAIT_FOR_JOIN0x17 /* Sheepdog is waiting for other nodes 
 joining */
 +#define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog 
 */
 +
 +/*
 + * Object ID rules
 + *
 + *  0 - 19 (20 bits): data object space
 + * 20 - 31 (12 bits): reserved data object space
 + * 32 - 55 (24 bits): vdi object space
 + * 56 - 59 ( 4 bits): reserved vdi object space
 + * 60 - 63 ( 4 bits): object type indentifier space
 + */
 +
 +#define VDI_SPACE_SHIFT   32
 +#define VDI_BIT (UINT64_C(1)  63)
 +#define VMSTATE_BIT