Re: [sheepdog] effective storing backups and deduplication
2015-02-11 15:43 GMT+03:00 Liu Yuan namei.u...@gmail.com: This scheme can build on the sheepdog's current features: 0 use qemu-img (recommenced because better performance) or dog to read the base vdi. 1 use dog to backup the delta data for different snapshots takben by qemu-img snapshot or dog vdi snapshot. 2 manage the delta data and the base for the user defined snapshots relations by the upper layer 3 use SD http storage to store the base and delta data. I guess you need something as the middle layer to map the user defined snapshots to sheepdog's base and delta data and implement gc in this middle layer. Authentication would be better implementated in this middleware. Nice =)! So questions is: 0 - i need to run qemu-img snapshot sheepdog:base.img ? 1 - can you provide cmdline for dog to understand what i need Big Thanks! -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] Build failed in Jenkins: sheepdog-build #631
See http://jenkins.sheepdog-project.org:8080/job/sheepdog-build/631/changes Changes: [liuyuan] dog: type cast miss at vdi_show_progress -- [...truncated 57 lines...] checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for size_t... yes checking for working alloca.h... yes checking for alloca... yes checking for dirent.h that defines DIR... yes checking for library containing opendir... none required checking for ANSI C header files... (cached) yes checking for sys/wait.h that is POSIX.1 compatible... yes checking arpa/inet.h usability... yes checking arpa/inet.h presence... yes checking for arpa/inet.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking netdb.h usability... yes checking netdb.h presence... yes checking for netdb.h... yes checking netinet/in.h usability... yes checking netinet/in.h presence... yes checking for netinet/in.h... yes checking for stdint.h... (cached) yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/ioctl.h usability... yes checking sys/ioctl.h presence... yes checking for sys/ioctl.h... yes checking sys/param.h usability... yes checking sys/param.h presence... yes checking for sys/param.h... yes checking sys/socket.h usability... yes checking sys/socket.h presence... yes checking for sys/socket.h... yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking syslog.h usability... yes checking syslog.h presence... yes checking for syslog.h... yes checking for unistd.h... (cached) yes checking for sys/types.h... (cached) yes checking getopt.h usability... yes checking getopt.h presence... yes checking for getopt.h... yes checking malloc.h usability... yes checking malloc.h presence... yes checking for malloc.h... yes checking sys/sockio.h usability... no checking sys/sockio.h presence... no checking for sys/sockio.h... no checking utmpx.h usability... yes checking utmpx.h presence... yes checking for utmpx.h... yes checking urcu.h usability... yes checking urcu.h presence... yes checking for urcu.h... yes checking urcu/uatomic.h usability... yes checking urcu/uatomic.h presence... yes checking for urcu/uatomic.h... yes checking for an ANSI C-conforming const... yes checking for uid_t in sys/types.h... yes checking for inline... inline checking for size_t... (cached) yes checking whether time.h and sys/time.h may both be included... yes checking for working volatile... yes checking size of short... 2 checking size of int... 4 checking size of long... 8 checking size of long long... 8 checking sys/eventfd.h usability... yes checking sys/eventfd.h presence... yes checking for sys/eventfd.h... yes checking sys/signalfd.h usability... yes checking sys/signalfd.h presence... yes checking for sys/signalfd.h... yes checking sys/timerfd.h usability... yes checking sys/timerfd.h presence... yes checking for sys/timerfd.h... yes checking whether closedir returns void... no checking for error_at_line... yes checking for mbstate_t... yes checking for working POSIX fnmatch... yes checking for pid_t... yes checking vfork.h usability... no checking vfork.h presence... no checking for vfork.h... no checking for fork... yes checking for vfork... yes checking for working fork... yes checking for working vfork... (cached) yes checking whether gcc needs -traditional... no checking for stdlib.h... (cached) yes checking for GNU libc compatible malloc... yes checking for working memcmp... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking sys/select.h usability... yes checking sys/select.h presence... yes checking for sys/select.h... yes checking for sys/socket.h... (cached) yes checking types of arguments for select... int,fd_set *,struct timeval * checking return type of signal handlers... void checking for vprintf... yes checking for _doprnt... no checking for alarm... yes checking for alphasort... yes checking for atexit... yes checking for bzero... yes checking for dup2... yes checking for endgrent... yes checking for endpwent... yes checking for fcntl... yes checking for getcwd... yes checking for getpeerucred... no checking for getpeereid... no checking for gettimeofday... yes checking for inet_ntoa... yes checking for memmove... yes checking for memset... yes checking for mkdir... yes checking for scandir... yes checking for select... yes checking for socket... yes checking for strcasecmp... yes
Re: [sheepdog] [PATCH] dog: fix to calculate a resizable max VDI size appropriately
On Tue, Feb 10, 2015 at 05:53:44PM +0900, Teruaki Ishizaki wrote: A resizable max VDI size was fixed value, 4TB. So, when block_size_shift was specified more than 22, resizing VDI size over 4TB caused error. This patch enables to calculate a resizable max VDI properly. Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- dog/vdi.c | 24 +++- 1 files changed, 15 insertions(+), 9 deletions(-) diff --git a/dog/vdi.c b/dog/vdi.c index 6cb813e..8e5ab13 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -845,8 +845,8 @@ out: static int vdi_resize(int argc, char **argv) { const char *vdiname = argv[optind++]; - uint64_t new_size; - uint32_t vid; + uint64_t new_size, old_max_total_size; + uint32_t vid, object_size; int ret; char buf[SD_INODE_HEADER_SIZE]; struct sd_inode *inode = (struct sd_inode *)buf; @@ -863,13 +863,19 @@ static int vdi_resize(int argc, char **argv) if (ret != EXIT_SUCCESS) return ret; - if (new_size SD_OLD_MAX_VDI_SIZE 0 == inode-store_policy) { - sd_err(New VDI size is too large); - return EXIT_USAGE; - } - - if (new_size SD_MAX_VDI_SIZE) { - sd_err(New VDI size is too large); + object_size = (UINT32_C(1) inode-block_size_shift); + old_max_total_size = object_size * OLD_MAX_DATA_OBJS; + if (0 == inode-store_policy) { + if (new_size old_max_total_size) { + sd_err(New VDI size is too large. + This volume's max size is %PRIu64, +old_max_total_size); + return EXIT_USAGE; + } + } else if (new_size SD_MAX_VDI_SIZE) { + sd_err(New VDI size is too large + This volume's max size is %llu, + SD_MAX_VDI_SIZE); return EXIT_USAGE; } Applied thanks. BTW, we don't reach how we make use this additional field yet. I don't think we should expose block_size_shift to plain users. For users, object size is much more direct and simple to understand. There is no reason we can't expose object size as option to users, no? Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
On Wed, Feb 11, 2015 at 05:57:25PM +0400, Vasiliy Tolstov wrote: 2015-02-11 15:43 GMT+03:00 Liu Yuan namei.u...@gmail.com: This scheme can build on the sheepdog's current features: 0 use qemu-img (recommenced because better performance) or dog to read the base vdi. 1 use dog to backup the delta data for different snapshots takben by qemu-img snapshot or dog vdi snapshot. 2 manage the delta data and the base for the user defined snapshots relations by the upper layer 3 use SD http storage to store the base and delta data. I guess you need something as the middle layer to map the user defined snapshots to sheepdog's base and delta data and implement gc in this middle layer. Authentication would be better implementated in this middleware. Nice =)! So questions is: 0 - i need to run qemu-img snapshot sheepdog:base.img ? qemu-img snapshot sheepdog:your_vdi will snapshot this vdi and switch to the new vdi, the old vdi will be marked as snapshot. 1 - can you provide cmdline for dog to understand what i need See this https://github.com/sheepdog/sheepdog/wiki/Image-Backup Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote: (2015/02/10 20:12), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 This might not be necessary. For old qemu or the qemu-img without setting option, the block_size_shift will be 0. If we make 0 to represent 4MB object, then we don't need to get the default cluster object size. We migth even get rid of the idea of cluster default size. The downsize is that, if we want to create a vdi with different size not the default 4MB, we have to write it every time for qemu-img or dog. If we choose to keep the idea of cluster default size, I think we'd also try to avoid call this request from QEMU to make backward compatibility easier. In this scenario, 0 might be used to ask new sheep to decide to use cluster default size. Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can handle 0 though it has different meanings. Table for this bit as 0: Qe: qemu SD: Sheep daemon CDS: Cluster Default Size Ign: Ignored by the sheep daemon Qe/sd newold new CDSIgn old CDSNULL Does Ign mean that VDI is handled as 4MB object size? Yes, old sheep can only handle 4MB object and doesn't check this field at all. I think this approach is acceptable. The difference to your patch is that we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and SD_OP_GET_CLUSTER_DEFAULT can be removed. When users create a new VDI with qemu-img, qemu's Sheepdog backend driver calculates max limit VDI size. But if block_size_shift option is not specified, qemu's Sheepdog backend driver can't calculate max limit VDI size. If block_size_shift not specified, this means 1 for old sheep, use 4MB size 2 for new sheep, use cluster wide default value. And sheep then can calculate it on its own, no? Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
On Thu, Feb 12, 2015 at 11:33:16AM +0900, Teruaki Ishizaki wrote: (2015/02/12 11:19), Liu Yuan wrote: On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote: (2015/02/10 20:12), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 This might not be necessary. For old qemu or the qemu-img without setting option, the block_size_shift will be 0. If we make 0 to represent 4MB object, then we don't need to get the default cluster object size. We migth even get rid of the idea of cluster default size. The downsize is that, if we want to create a vdi with different size not the default 4MB, we have to write it every time for qemu-img or dog. If we choose to keep the idea of cluster default size, I think we'd also try to avoid call this request from QEMU to make backward compatibility easier. In this scenario, 0 might be used to ask new sheep to decide to use cluster default size. Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can handle 0 though it has different meanings. Table for this bit as 0: Qe: qemu SD: Sheep daemon CDS: Cluster Default Size Ign: Ignored by the sheep daemon Qe/sd newold new CDSIgn old CDSNULL Does Ign mean that VDI is handled as 4MB object size? Yes, old sheep can only handle 4MB object and doesn't check this field at all. I think this approach is acceptable. The difference to your patch is that we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and SD_OP_GET_CLUSTER_DEFAULT can be removed. When users create a new VDI with qemu-img, qemu's Sheepdog backend driver calculates max limit VDI size. But if block_size_shift option is not specified, qemu's Sheepdog backend driver can't calculate max limit VDI size. If block_size_shift not specified, this means 1 for old sheep, use 4MB size 2 for new sheep, use cluster wide default value. And sheep then can calculate it on its own, no? Dog command(client) calculate max size, so I think that qemu's Sheepdog backend driver should calculate it like dog command. Is that policy changeable? I checked the QEMU code and got your idea. In the past it was fixed size so very easy to hardcode the check in the client, no communication with sheep needed. Yes, if it is reasonable, we can change it. I think we can push the size calculation logic into sheep, if not the right size return INVALID_PARAMETER to clients. Clients just check this and report error back to users. There is no backward compability for this approach, since 4MB is the smallest size. OLD QEMU will limit the max_size as 4TB, which is no problem for new sheep. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] who is in charge of Jenkins-sheepdog
Hi all, who has the right permission to install yasm on the sheepdog Jenkins server? Our list is annoyed everyday, please save us from it. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
(2015/02/10 20:12), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 This might not be necessary. For old qemu or the qemu-img without setting option, the block_size_shift will be 0. If we make 0 to represent 4MB object, then we don't need to get the default cluster object size. We migth even get rid of the idea of cluster default size. The downsize is that, if we want to create a vdi with different size not the default 4MB, we have to write it every time for qemu-img or dog. If we choose to keep the idea of cluster default size, I think we'd also try to avoid call this request from QEMU to make backward compatibility easier. In this scenario, 0 might be used to ask new sheep to decide to use cluster default size. Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can handle 0 though it has different meanings. Table for this bit as 0: Qe: qemu SD: Sheep daemon CDS: Cluster Default Size Ign: Ignored by the sheep daemon Qe/sd newold new CDSIgn old CDSNULL Does Ign mean that VDI is handled as 4MB object size? I think this approach is acceptable. The difference to your patch is that we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and SD_OP_GET_CLUSTER_DEFAULT can be removed. When users create a new VDI with qemu-img, qemu's Sheepdog backend driver calculates max limit VDI size. But if block_size_shift option is not specified, qemu's Sheepdog backend driver can't calculate max limit VDI size. So, I think that qemu's Sheepdog backend driver must get cluster default value from sheep daemon. Thanks, Teruaki -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
(2015/02/12 11:19), Liu Yuan wrote: On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote: (2015/02/10 20:12), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 This might not be necessary. For old qemu or the qemu-img without setting option, the block_size_shift will be 0. If we make 0 to represent 4MB object, then we don't need to get the default cluster object size. We migth even get rid of the idea of cluster default size. The downsize is that, if we want to create a vdi with different size not the default 4MB, we have to write it every time for qemu-img or dog. If we choose to keep the idea of cluster default size, I think we'd also try to avoid call this request from QEMU to make backward compatibility easier. In this scenario, 0 might be used to ask new sheep to decide to use cluster default size. Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can handle 0 though it has different meanings. Table for this bit as 0: Qe: qemu SD: Sheep daemon CDS: Cluster Default Size Ign: Ignored by the sheep daemon Qe/sd newold new CDSIgn old CDSNULL Does Ign mean that VDI is handled as 4MB object size? Yes, old sheep can only handle 4MB object and doesn't check this field at all. I think this approach is acceptable. The difference to your patch is that we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and SD_OP_GET_CLUSTER_DEFAULT can be removed. When users create a new VDI with qemu-img, qemu's Sheepdog backend driver calculates max limit VDI size. But if block_size_shift option is not specified, qemu's Sheepdog backend driver can't calculate max limit VDI size. If block_size_shift not specified, this means 1 for old sheep, use 4MB size 2 for new sheep, use cluster wide default value. And sheep then can calculate it on its own, no? Dog command(client) calculate max size, so I think that qemu's Sheepdog backend driver should calculate it like dog command. Is that policy changeable? Is there no policy? Thanks, Teruaki -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots
On Thu, Feb 12, 2015 at 04:40:56PM +0900, Hitoshi Mitake wrote: At Thu, 12 Feb 2015 15:31:15 +0800, Liu Yuan wrote: On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote: At Thu, 12 Feb 2015 14:38:37 +0800, Liu Yuan wrote: On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote: Current dog vdi snapshot command creates a new snapshot unconditionally, even if a working VDI doesn't have its own objects. In such a case, the created snapshot is redundant because same VDI is already existing. What kind of use case will create two identical snapshots? This logic is simple and code is clean, but I doubt if there is real users of this option. Generally speaking, taking snapshot periodically is an ordinal usecase of enterprise SAN. Of course sheepdog can support this use case. In a case of sheepdog, making cron job (e.g. daily) which invokes dog vdi snapshot simply enables it. But if a VDI doesn't have COWed objects, the snapshot will be redundant. So I want to add this option. Okay, your patch makes sense for periodic snapshot. But if dog have found identical snapshots, it won't create a new one and return success to the caller. I assume the caller is some middleware, if there is no new vdi returned, will this cause trouble for it? This means it will need to call 'vdi list' to check if new vdi created or not? So I'm adding this feature with the new option. Existing semantics isn't affected. And if checking process (has_own_objects()) faces error, it is reported correctly to middleware. I'm not agasint this patch, but I have some questions. For identical snapshots, the overhead is just an inode object created, no? Looks to me the overhead is quite small and no need a special option to remove it. Taking snapshots of thousands of VDIs will consume thousands of VID, and create thousands * replication factor of inodes. I'm not sure the consumption of VID will become serious problem, but inodes will make replication time longer (e.g. 16:4 ec requires 20 inodes). Yes, this is the point. Make sense to me. I have some comments to the code in my last email. Could you submit a V2? BTW, it would be great if you can include above rationale into the commit log. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH 4/5] tests: fix content of 052.out
Since code is printf(%s\n, sd_strerror(rsp-result));, 052.out should add a new line. Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com --- tests/functional/052.out | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/functional/052.out b/tests/functional/052.out index 2a533d5..f4487d0 100644 --- a/tests/functional/052.out +++ b/tests/functional/052.out @@ -52,6 +52,7 @@ Failed to read object 807c2b25 Waiting for other nodes to join cluster Failed to read inode header NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift Cluster status: Waiting for other nodes to join cluster + Failed to read object 807c2b25 Waiting for other nodes to join cluster Failed to read inode header NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift -- 2.1.0 -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH 0/5] tests: fix some test cases to suitable for new sheepdog and QEMU
QEMU and sheepdog changes some output formats while upgrading to new version, so tests/functional test cases need some changes. Wang dongxu (5): tests: avoid qemu-io warning tests: avoid qemu-img snapshot warning tests: correct vdi list tests: fix content of 052.out tests:fix vnode strategy output tests/functional/013 | 8 tests/functional/017 | 14 +++--- tests/functional/024 | 6 +++--- tests/functional/025 | 4 ++-- tests/functional/030.out | 1 + tests/functional/039 | 22 +++--- tests/functional/052.out | 1 + tests/functional/058 | 2 +- tests/functional/059 | 2 +- tests/functional/073.out | 2 +- tests/functional/075 | 2 +- tests/functional/081.out | 6 +++--- tests/functional/082.out | 6 +++--- tests/functional/087.out | 10 +- tests/functional/089.out | 2 +- tests/functional/090.out | 6 +++--- tests/functional/096.out | 2 ++ tests/functional/099.out | 4 ++-- 18 files changed, 52 insertions(+), 48 deletions(-) -- 2.1.0 -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH 1/5] tests: avoid qemu-io warning
qemu-io command add a warning message because probing a raw img is dangerous. So add -f option to avoid this. Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com --- tests/functional/013 | 6 +++--- tests/functional/017 | 2 +- tests/functional/024 | 6 +++--- tests/functional/025 | 4 ++-- tests/functional/039 | 22 +++--- tests/functional/058 | 2 +- tests/functional/059 | 2 +- tests/functional/075 | 2 +- 8 files changed, 23 insertions(+), 23 deletions(-) diff --git a/tests/functional/013 b/tests/functional/013 index b35b806..f724841 100755 --- a/tests/functional/013 +++ b/tests/functional/013 @@ -14,11 +14,11 @@ _cluster_format -c 1 _vdi_create test 4G for i in `seq 1 9`; do -$QEMU_IO -c write 0 512 -P $i sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 -P $i sheepdog:test | _filter_qemu_io $QEMU_IMG snapshot -c tag$i sheepdog:test done -$QEMU_IO -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io for i in `seq 1 9`; do -$QEMU_IO -c read 0 512 -P $i sheepdog:test:tag$i | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P $i sheepdog:test:tag$i | _filter_qemu_io done diff --git a/tests/functional/017 b/tests/functional/017 index 5ebe7da..1c22c76 100755 --- a/tests/functional/017 +++ b/tests/functional/017 @@ -20,7 +20,7 @@ $QEMU_IMG snapshot -c tag3 sheepdog:test _vdi_create test2 4G $QEMU_IMG snapshot -c tag1 sheepdog:test2 $QEMU_IMG snapshot -c tag2 sheepdog:test2 -$QEMU_IO -c write 0 512 sheepdog:test2:1 | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 sheepdog:test2:1 | _filter_qemu_io $QEMU_IMG snapshot -c tag3 sheepdog:test2 $DOG vdi tree | _filter_short_date diff --git a/tests/functional/024 b/tests/functional/024 index e1c1180..e8a33c4 100755 --- a/tests/functional/024 +++ b/tests/functional/024 @@ -23,14 +23,14 @@ _vdi_create ${VDI_NAME} ${VDI_SIZE} sleep 1 echo filling ${VDI_NAME} with data -$QEMU_IO -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io +$QEMU_IO -f raw -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io echo reading back ${VDI_NAME} -$QEMU_IO -c read 0 1m sheepdog:${VDI_NAME} | _filter_qemu_io +$QEMU_IO -f raw -c read 0 1m sheepdog:${VDI_NAME} | _filter_qemu_io echo starting second sheep _start_sheep 6 _wait_for_sheep 7 echo reading data from second sheep -$QEMU_IO -c read 0 ${VDI_SIZE} sheepdog:localhost:7001:${VDI_NAME} | _filter_qemu_io +$QEMU_IO -f raw -c read 0 ${VDI_SIZE} sheepdog:localhost:7001:${VDI_NAME} | _filter_qemu_io diff --git a/tests/functional/025 b/tests/functional/025 index 8f89ccb..37af0ea 100755 --- a/tests/functional/025 +++ b/tests/functional/025 @@ -26,10 +26,10 @@ echo creating vdi ${NAME} $DOG vdi create ${VDI_NAME} ${VDI_SIZE} echo filling ${VDI_NAME} with data -$QEMU_IO -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io +$QEMU_IO -f raw -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io echo reading back ${VDI_NAME} from second zone -$QEMU_IO -c read 0 1m sheepdog:localhost:7002:${VDI_NAME} | _filter_qemu_io +$QEMU_IO -f raw -c read 0 1m sheepdog:localhost:7002:${VDI_NAME} | _filter_qemu_io echo starting a sheep in the third zone for i in `seq 3 3`; do diff --git a/tests/functional/039 b/tests/functional/039 index 5b2540f..fddd4fb 100755 --- a/tests/functional/039 +++ b/tests/functional/039 @@ -13,37 +13,37 @@ _wait_for_sheep 6 _cluster_format -c 6 _vdi_create test 4G -$QEMU_IO -c write 0 512 -P 1 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 -P 1 sheepdog:test | _filter_qemu_io $DOG vdi snapshot test -s snap1 -$QEMU_IO -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io echo yes | $DOG vdi rollback test -s snap1 -$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io $DOG vdi tree | _filter_short_date _vdi_list -$QEMU_IO -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io $DOG vdi snapshot test -s snap2 -$QEMU_IO -c write 0 512 -P 3 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c write 0 512 -P 3 sheepdog:test | _filter_qemu_io echo yes | $DOG vdi rollback test -s snap1 -$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io $DOG vdi tree | _filter_short_date _vdi_list echo yes | $DOG vdi rollback test -s snap2 -$QEMU_IO -c read 0 512 -P 2 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P 2 sheepdog:test | _filter_qemu_io $DOG vdi tree | _filter_short_date _vdi_list echo yes | $DOG vdi rollback test -s snap1 -$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io +$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io $DOG vdi tree | _filter_short_date _vdi_list
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
At Thu, 12 Feb 2015 15:00:49 +0800, Liu Yuan wrote: On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote: At Tue, 10 Feb 2015 18:35:58 +0800, Liu Yuan wrote: On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote: (2015/02/10 17:58), Liu Yuan wrote: On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote: (2015/02/10 12:10), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 #define SD_FLAG_CMD_WRITE0x01 #define SD_FLAG_CMD_COW 0x02 @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq { uint32_t base_vdi_id; uint8_t copies; uint8_t copy_policy; -uint8_t reserved[2]; +uint8_t store_policy; +uint8_t block_size_shift; uint32_t snapid; uint32_t type; uint32_t pad[2]; @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp { uint32_t pad[5]; } SheepdogVdiRsp; +typedef struct SheepdogClusterRsp { +uint8_t proto_ver; +uint8_t opcode; +uint16_t flags; +uint32_t epoch; +uint32_t id; +uint32_t data_length; +uint32_t result; +uint8_t nr_copies; +uint8_t copy_policy; +uint8_t block_size_shift; +uint8_t __pad1; +uint32_t __pad2[6]; +} SheepdogClusterRsp; + typedef struct SheepdogInode { char name[SD_MAX_VDI_LEN]; char tag[SD_MAX_VDI_TAG_LEN]; @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, hdr.vdi_size = s-inode.vdi_size; hdr.copy_policy = s-inode.copy_policy; hdr.copies = s-inode.nr_copies; +hdr.block_size_shift = s-inode.block_size_shift; ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, rlen); @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, static int sd_prealloc(const char *filename, Error **errp) { BlockDriverState *bs = NULL; +BDRVSheepdogState *base = NULL; +unsigned long buf_size; uint32_t idx, max_idx; +uint32_t object_size; int64_t vdi_size; -void *buf = g_malloc0(SD_DATA_OBJ_SIZE); +void *buf = NULL; int ret; ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, Error **errp) ret = vdi_size; goto out; } -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE); + +base = bs-opaque; +object_size = (UINT32_C(1) base-inode.block_size_shift); +buf_size = MIN(object_size, SD_DATA_OBJ_SIZE); +buf = g_malloc0(buf_size); + +max_idx = DIV_ROUND_UP(vdi_size, buf_size); for (idx = 0; idx max_idx; idx++) {
Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots
On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote: Current dog vdi snapshot command creates a new snapshot unconditionally, even if a working VDI doesn't have its own objects. In such a case, the created snapshot is redundant because same VDI is already existing. What kind of use case will create two identical snapshots? This logic is simple and code is clean, but I doubt if there is real users of this option. This patch adds a new option -R to the dog command for reducing the identical snapshots. Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- dog/vdi.c | 48 +++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/dog/vdi.c b/dog/vdi.c index 8e612af..ee465c2 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -40,6 +40,8 @@ static struct sd_option vdi_options[] = { neither comparing nor repairing}, {'z', block_size_shift, true, specify the bit shift num for data object size}, + {'R', reduce-identical-snapshots, false, do not create snapshot if + working VDI doesn't have its own objects}, { 0, NULL, false, NULL }, }; @@ -61,6 +63,7 @@ static struct vdi_cmd_data { uint64_t oid; bool no_share; bool exist; + bool reduce_identical_snapshots; } vdi_cmd_data = { ~0, }; struct get_vdi_info { @@ -605,6 +608,31 @@ fail: return NULL; } +static bool has_own_objects(uint32_t vid, int *ret) Traditionally, we'll have functions return SD_RES_xxx because in this way we could propragate the ret to upper callers. So it is better to have has_own_objects return SD_RES_xxx for consistency. Thanks, Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
On Thu, Feb 12, 2015 at 04:28:01PM +0900, Hitoshi Mitake wrote: At Thu, 12 Feb 2015 15:00:49 +0800, Liu Yuan wrote: On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote: At Tue, 10 Feb 2015 18:35:58 +0800, Liu Yuan wrote: On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote: (2015/02/10 17:58), Liu Yuan wrote: On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote: (2015/02/10 12:10), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 #define SD_FLAG_CMD_WRITE0x01 #define SD_FLAG_CMD_COW 0x02 @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq { uint32_t base_vdi_id; uint8_t copies; uint8_t copy_policy; -uint8_t reserved[2]; +uint8_t store_policy; +uint8_t block_size_shift; uint32_t snapid; uint32_t type; uint32_t pad[2]; @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp { uint32_t pad[5]; } SheepdogVdiRsp; +typedef struct SheepdogClusterRsp { +uint8_t proto_ver; +uint8_t opcode; +uint16_t flags; +uint32_t epoch; +uint32_t id; +uint32_t data_length; +uint32_t result; +uint8_t nr_copies; +uint8_t copy_policy; +uint8_t block_size_shift; +uint8_t __pad1; +uint32_t __pad2[6]; +} SheepdogClusterRsp; + typedef struct SheepdogInode { char name[SD_MAX_VDI_LEN]; char tag[SD_MAX_VDI_TAG_LEN]; @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, hdr.vdi_size = s-inode.vdi_size; hdr.copy_policy = s-inode.copy_policy; hdr.copies = s-inode.nr_copies; +hdr.block_size_shift = s-inode.block_size_shift; ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, rlen); @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, static int sd_prealloc(const char *filename, Error **errp) { BlockDriverState *bs = NULL; +BDRVSheepdogState *base = NULL; +unsigned long buf_size; uint32_t idx, max_idx; +uint32_t object_size; int64_t vdi_size; -void *buf = g_malloc0(SD_DATA_OBJ_SIZE); +void *buf = NULL; int ret; ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, Error **errp) ret = vdi_size; goto out; } -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE); + +base = bs-opaque; +object_size = (UINT32_C(1)
Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots
At Thu, 12 Feb 2015 15:31:15 +0800, Liu Yuan wrote: On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote: At Thu, 12 Feb 2015 14:38:37 +0800, Liu Yuan wrote: On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote: Current dog vdi snapshot command creates a new snapshot unconditionally, even if a working VDI doesn't have its own objects. In such a case, the created snapshot is redundant because same VDI is already existing. What kind of use case will create two identical snapshots? This logic is simple and code is clean, but I doubt if there is real users of this option. Generally speaking, taking snapshot periodically is an ordinal usecase of enterprise SAN. Of course sheepdog can support this use case. In a case of sheepdog, making cron job (e.g. daily) which invokes dog vdi snapshot simply enables it. But if a VDI doesn't have COWed objects, the snapshot will be redundant. So I want to add this option. Okay, your patch makes sense for periodic snapshot. But if dog have found identical snapshots, it won't create a new one and return success to the caller. I assume the caller is some middleware, if there is no new vdi returned, will this cause trouble for it? This means it will need to call 'vdi list' to check if new vdi created or not? So I'm adding this feature with the new option. Existing semantics isn't affected. And if checking process (has_own_objects()) faces error, it is reported correctly to middleware. I'm not agasint this patch, but I have some questions. For identical snapshots, the overhead is just an inode object created, no? Looks to me the overhead is quite small and no need a special option to remove it. Taking snapshots of thousands of VDIs will consume thousands of VID, and create thousands * replication factor of inodes. I'm not sure the consumption of VID will become serious problem, but inodes will make replication time longer (e.g. 16:4 ec requires 20 inodes). Thanks, Hitoshi Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH] sheepdog:show more detail of the crash source
In the current sheepdog, when sheepdog crashed, there is too little information about the signal source. This patch use (*handler)(int, siginfo_t *, void *) instead of (*handler)(int). In this way, can show more detail of the crash problem, especially the pid of singal sender Cc: Hitoshi Mitake mitake.hito...@gmail.com Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com --- dog/dog.c |2 +- include/util.h |4 ++-- lib/logger.c|4 ++-- lib/util.c | 11 +++ sheep/sheep.c |9 + shepherd/shepherd.c |2 +- 6 files changed, 18 insertions(+), 14 deletions(-) diff --git a/dog/dog.c b/dog/dog.c index 54520dd..77aa27b 100644 --- a/dog/dog.c +++ b/dog/dog.c @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char *opts) return sd_opts; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { sd_err(dog exits unexpectedly (%s)., strsignal(signo)); diff --git a/include/util.h b/include/util.h index 6a513e0..3c34b40 100644 --- a/include/util.h +++ b/include/util.h @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path); int purge_directory(const char *dir_path); bool is_numeric(const char *p); const char *data_to_str(void *data, size_t data_length); -int install_sighandler(int signum, void (*handler)(int), bool once); -int install_crash_handler(void (*handler)(int)); +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once); +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)); void reraise_crash_signal(int signo, int status); pid_t gettid(void); int tkill(int tid, int sig); diff --git a/lib/logger.c b/lib/logger.c index 02bab00..da0ebac 100644 --- a/lib/logger.c +++ b/lib/logger.c @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo) return signo == SIGHUP; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { if (is_sheep_dead(signo)) sd_err(sheep pid %d exited unexpectedly., sheep_pid); @@ -552,7 +552,7 @@ static void crash_handler(int signo) reraise_crash_signal(signo, 1); } -static void sighup_handler(int signo) +static void sighup_handler(int signo, siginfo_t *info, void *context) { rotate_log(); } diff --git a/lib/util.c b/lib/util.c index 21e0143..089455d 100644 --- a/lib/util.c +++ b/lib/util.c @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length) * If 'once' is true, the signal will be restored to the default state * after 'handler' is called. */ -int install_sighandler(int signum, void (*handler)(int), bool once) +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once) { struct sigaction sa = {}; sa.sa_handler = handler; + sa.sa_flags = SA_SIGINFO; + if (once) - sa.sa_flags = SA_RESETHAND | SA_NODEFER; + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER; sigemptyset(sa.sa_mask); return sigaction(signum, sa, NULL); } -int install_crash_handler(void (*handler)(int)) +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)) { return install_sighandler(SIGSEGV, handler, true) || install_sighandler(SIGABRT, handler, true) || install_sighandler(SIGBUS, handler, true) || install_sighandler(SIGILL, handler, true) || - install_sighandler(SIGFPE, handler, true); + install_sighandler(SIGFPE, handler, true) || + install_sighandler(SIGQUIT, handler, true); } /* diff --git a/sheep/sheep.c b/sheep/sheep.c index e0a034f..6c540ae 100644 --- a/sheep/sheep.c +++ b/sheep/sheep.c @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void *data) ret = read(sigfd, siginfo, sizeof(siginfo)); assert(ret == sizeof(siginfo)); - sd_debug(signal %d, siginfo.ssi_signo); + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid); switch (siginfo.ssi_signo) { case SIGTERM: sys-cinfo.status = SD_STATUS_KILLED; @@ -276,9 +276,10 @@ static int init_signal(void) return 0; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { - sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo)); + sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, code %d, + strsignal(signo), info-si_pid, info-si_uid, info-si_errno, info-si_code); sd_backtrace(); sd_dump_variable(__sys); @@ -639,7 +640,7 @@ end: return status; } -static void sighup_handler(int signum) +static void sighup_handler(int signo, siginfo_t *info, void *context) { if (unlikely(logger_pid == -1)) return; diff --git
[sheepdog] [PATCH 2/5] tests: avoid qemu-img snapshot warning
qemu-img snapshot option will print warining message while probing a raw image, so filter them using sed. Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com --- tests/functional/013 | 2 +- tests/functional/017 | 12 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tests/functional/013 b/tests/functional/013 index f724841..d19d8f8 100755 --- a/tests/functional/013 +++ b/tests/functional/013 @@ -15,7 +15,7 @@ _cluster_format -c 1 _vdi_create test 4G for i in `seq 1 9`; do $QEMU_IO -f raw -c write 0 512 -P $i sheepdog:test | _filter_qemu_io -$QEMU_IMG snapshot -c tag$i sheepdog:test +$QEMU_IMG snapshot -c tag$i sheepdog:test 21 | sed '/WARNING/, +2 d' done $QEMU_IO -f raw -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io diff --git a/tests/functional/017 b/tests/functional/017 index 1c22c76..2c34a55 100755 --- a/tests/functional/017 +++ b/tests/functional/017 @@ -13,14 +13,14 @@ _wait_for_sheep 6 _cluster_format -c 1 _vdi_create test 4G -$QEMU_IMG snapshot -c tag1 sheepdog:test -$QEMU_IMG snapshot -c tag2 sheepdog:test -$QEMU_IMG snapshot -c tag3 sheepdog:test +$QEMU_IMG snapshot -c tag1 sheepdog:test 21 | sed '/WARNING/, +2 d' +$QEMU_IMG snapshot -c tag2 sheepdog:test 21 | sed '/WARNING/, +2 d' +$QEMU_IMG snapshot -c tag3 sheepdog:test 21 | sed '/WARNING/, +2 d' _vdi_create test2 4G -$QEMU_IMG snapshot -c tag1 sheepdog:test2 -$QEMU_IMG snapshot -c tag2 sheepdog:test2 +$QEMU_IMG snapshot -c tag1 sheepdog:test2 21 | sed '/WARNING/, +2 d' +$QEMU_IMG snapshot -c tag2 sheepdog:test2 21 | sed '/WARNING/, +2 d' $QEMU_IO -f raw -c write 0 512 sheepdog:test2:1 | _filter_qemu_io -$QEMU_IMG snapshot -c tag3 sheepdog:test2 +$QEMU_IMG snapshot -c tag3 sheepdog:test2 21 | sed '/WARNING/, +2 d' $DOG vdi tree | _filter_short_date -- 2.1.0 -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH 3/5] tests: correct vdi list
dog vdi list add column Block Size Shift, add them to test cases. Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com --- tests/functional/073.out | 2 +- tests/functional/081.out | 6 +++--- tests/functional/082.out | 6 +++--- tests/functional/087.out | 10 +- tests/functional/089.out | 2 +- tests/functional/090.out | 6 +++--- tests/functional/099.out | 4 ++-- 7 files changed, 18 insertions(+), 18 deletions(-) diff --git a/tests/functional/073.out b/tests/functional/073.out index 8dd2173..3c5fd47 100644 --- a/tests/functional/073.out +++ b/tests/functional/073.out @@ -6,6 +6,6 @@ Cluster created at DATE Epoch Time Version [Host:Port:V-Nodes,,,] DATE 1 [127.0.0.1:7000:128, 127.0.0.1:7001:128, 127.0.0.1:7002:128] - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift test 0 4.0 MB 0.0 MB 0.0 MB DATE 7c2b25 3 hello diff --git a/tests/functional/081.out b/tests/functional/081.out index 92df8ca..7092e97 100644 --- a/tests/functional/081.out +++ b/tests/functional/081.out @@ -55,7 +55,7 @@ vdi.c HTTP/1.1 416 Requested Range Not Satisfiable HTTP/1.1 416 Requested Range Not Satisfiable HTTP/1.1 416 Requested Range Not Satisfiable - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 144 MB 0.0 MB DATE 8ad11e4:2 @@ -65,7 +65,7 @@ data137 data19 data4 data97 - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 144 MB 0.0 MB DATE 8ad11e4:2 @@ -73,7 +73,7 @@ data97 sd/sheep/allocator 0 16 PB 268 MB 0.0 MB DATE fd57fc4:2 dog sheep - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 144 MB 0.0 MB DATE 8ad11e4:2 diff --git a/tests/functional/082.out b/tests/functional/082.out index b3f4dd9..78c5e6a 100644 --- a/tests/functional/082.out +++ b/tests/functional/082.out @@ -60,7 +60,7 @@ trace.c treeview.c trunk.c vdi.c - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 176 MB 0.0 MB DATE 8ad11e4:2 @@ -78,7 +78,7 @@ data6 data7 data8 data9 - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 176 MB 0.0 MB DATE 8ad11e4:2 @@ -86,7 +86,7 @@ data9 sd/sheep/allocator 0 16 PB 316 MB 0.0 MB DATE fd57fc4:2 dog sheep - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd/dog 0 16 PB 56 MB 0.0 MB DATE 5a5cbf4:2 sd 0 16 PB 8.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 176 MB 0.0 MB DATE 8ad11e4:2 diff --git a/tests/functional/087.out b/tests/functional/087.out index 04e4210..0fcc7f3 100644 --- a/tests/functional/087.out +++ b/tests/functional/087.out @@ -2,11 +2,11 @@ QA output created by 087 using backend plain store 206 206 - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift sd 0 16 PB 4.0 MB 0.0 MB DATE 7927f24:2 sd/sheep 0 16 PB 8.0 MB 0.0 MB DATE 8ad11e4:2 sd/sheep/allocator 0 16 PB
Re: [sheepdog] [PATCH v2] sheepdog:show more detail of the crash source
At Wed, 11 Feb 2015 23:44:53 -0800, Wang Zhengyong wrote: In the current sheepdog, when sheepdog crashed, there is too little information about the signal source. This patch use (*handler)(int, siginfo_t *, void *) instead of (*handler)(int). In this way, can show more detail of the crash problem, especially the pid of singal sender Cc: Hitoshi Mitake mitake.hito...@gmail.com Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com --- v2: fix the wrong handler assignment --- Sorry, I missed style problems in the previous version, checkpatch reports like below: WARNING: line over 80 characters #72: FILE: include/util.h:111: +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once); WARNING: line over 80 characters #108: FILE: lib/util.c:527: +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once) WARNING: line over 80 characters #159: FILE: sheep/sheep.c:282: + strsignal(signo), info-si_pid, info-si_uid, info-si_errno, info-si_code); Could you fix them? Then I can apply it. Thanks, Hitoshi dog/dog.c |2 +- include/util.h |4 ++-- lib/logger.c|4 ++-- lib/util.c | 13 - sheep/sheep.c |9 + shepherd/shepherd.c |2 +- 6 files changed, 19 insertions(+), 15 deletions(-) diff --git a/dog/dog.c b/dog/dog.c index 54520dd..77aa27b 100644 --- a/dog/dog.c +++ b/dog/dog.c @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char *opts) return sd_opts; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { sd_err(dog exits unexpectedly (%s)., strsignal(signo)); diff --git a/include/util.h b/include/util.h index 6a513e0..3c34b40 100644 --- a/include/util.h +++ b/include/util.h @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path); int purge_directory(const char *dir_path); bool is_numeric(const char *p); const char *data_to_str(void *data, size_t data_length); -int install_sighandler(int signum, void (*handler)(int), bool once); -int install_crash_handler(void (*handler)(int)); +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once); +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)); void reraise_crash_signal(int signo, int status); pid_t gettid(void); int tkill(int tid, int sig); diff --git a/lib/logger.c b/lib/logger.c index 02bab00..da0ebac 100644 --- a/lib/logger.c +++ b/lib/logger.c @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo) return signo == SIGHUP; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { if (is_sheep_dead(signo)) sd_err(sheep pid %d exited unexpectedly., sheep_pid); @@ -552,7 +552,7 @@ static void crash_handler(int signo) reraise_crash_signal(signo, 1); } -static void sighup_handler(int signo) +static void sighup_handler(int signo, siginfo_t *info, void *context) { rotate_log(); } diff --git a/lib/util.c b/lib/util.c index 21e0143..e217629 100644 --- a/lib/util.c +++ b/lib/util.c @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length) * If 'once' is true, the signal will be restored to the default state * after 'handler' is called. */ -int install_sighandler(int signum, void (*handler)(int), bool once) +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once) { struct sigaction sa = {}; - sa.sa_handler = handler; + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO; + if (once) - sa.sa_flags = SA_RESETHAND | SA_NODEFER; + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER; sigemptyset(sa.sa_mask); return sigaction(signum, sa, NULL); } -int install_crash_handler(void (*handler)(int)) +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)) { return install_sighandler(SIGSEGV, handler, true) || install_sighandler(SIGABRT, handler, true) || install_sighandler(SIGBUS, handler, true) || install_sighandler(SIGILL, handler, true) || - install_sighandler(SIGFPE, handler, true); + install_sighandler(SIGFPE, handler, true) || + install_sighandler(SIGQUIT, handler, true); } /* diff --git a/sheep/sheep.c b/sheep/sheep.c index e0a034f..6c540ae 100644 --- a/sheep/sheep.c +++ b/sheep/sheep.c @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void *data) ret = read(sigfd, siginfo, sizeof(siginfo)); assert(ret == sizeof(siginfo)); - sd_debug(signal %d, siginfo.ssi_signo); + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid); switch
[sheepdog] [PATCH v2] sheepdog:show more detail of the crash source
In the current sheepdog, when sheepdog crashed, there is too little information about the signal source. This patch use (*handler)(int, siginfo_t *, void *) instead of (*handler)(int). In this way, can show more detail of the crash problem, especially the pid of singal sender Cc: Hitoshi Mitake mitake.hito...@gmail.com Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com --- v2: fix the wrong handler assignment --- dog/dog.c |2 +- include/util.h |4 ++-- lib/logger.c|4 ++-- lib/util.c | 13 - sheep/sheep.c |9 + shepherd/shepherd.c |2 +- 6 files changed, 19 insertions(+), 15 deletions(-) diff --git a/dog/dog.c b/dog/dog.c index 54520dd..77aa27b 100644 --- a/dog/dog.c +++ b/dog/dog.c @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char *opts) return sd_opts; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { sd_err(dog exits unexpectedly (%s)., strsignal(signo)); diff --git a/include/util.h b/include/util.h index 6a513e0..3c34b40 100644 --- a/include/util.h +++ b/include/util.h @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path); int purge_directory(const char *dir_path); bool is_numeric(const char *p); const char *data_to_str(void *data, size_t data_length); -int install_sighandler(int signum, void (*handler)(int), bool once); -int install_crash_handler(void (*handler)(int)); +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once); +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)); void reraise_crash_signal(int signo, int status); pid_t gettid(void); int tkill(int tid, int sig); diff --git a/lib/logger.c b/lib/logger.c index 02bab00..da0ebac 100644 --- a/lib/logger.c +++ b/lib/logger.c @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo) return signo == SIGHUP; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { if (is_sheep_dead(signo)) sd_err(sheep pid %d exited unexpectedly., sheep_pid); @@ -552,7 +552,7 @@ static void crash_handler(int signo) reraise_crash_signal(signo, 1); } -static void sighup_handler(int signo) +static void sighup_handler(int signo, siginfo_t *info, void *context) { rotate_log(); } diff --git a/lib/util.c b/lib/util.c index 21e0143..e217629 100644 --- a/lib/util.c +++ b/lib/util.c @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length) * If 'once' is true, the signal will be restored to the default state * after 'handler' is called. */ -int install_sighandler(int signum, void (*handler)(int), bool once) +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once) { struct sigaction sa = {}; - sa.sa_handler = handler; + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO; + if (once) - sa.sa_flags = SA_RESETHAND | SA_NODEFER; + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER; sigemptyset(sa.sa_mask); return sigaction(signum, sa, NULL); } -int install_crash_handler(void (*handler)(int)) +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)) { return install_sighandler(SIGSEGV, handler, true) || install_sighandler(SIGABRT, handler, true) || install_sighandler(SIGBUS, handler, true) || install_sighandler(SIGILL, handler, true) || - install_sighandler(SIGFPE, handler, true); + install_sighandler(SIGFPE, handler, true) || + install_sighandler(SIGQUIT, handler, true); } /* diff --git a/sheep/sheep.c b/sheep/sheep.c index e0a034f..6c540ae 100644 --- a/sheep/sheep.c +++ b/sheep/sheep.c @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void *data) ret = read(sigfd, siginfo, sizeof(siginfo)); assert(ret == sizeof(siginfo)); - sd_debug(signal %d, siginfo.ssi_signo); + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid); switch (siginfo.ssi_signo) { case SIGTERM: sys-cinfo.status = SD_STATUS_KILLED; @@ -276,9 +276,10 @@ static int init_signal(void) return 0; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { - sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo)); + sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, code %d, + strsignal(signo), info-si_pid, info-si_uid, info-si_errno, info-si_code); sd_backtrace(); sd_dump_variable(__sys); @@ -639,7 +640,7 @@ end: return status; } -static void sighup_handler(int signum) +static void sighup_handler(int signo, siginfo_t *info, void *context) {
[sheepdog] [PATCH 5/5] tests:fix vnode strategy output
since commit 5fed9d6, Cluster vnodes strategy information is produced, so fix them in test cases. Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com --- tests/functional/030.out | 1 + tests/functional/096.out | 2 ++ 2 files changed, 3 insertions(+) diff --git a/tests/functional/030.out b/tests/functional/030.out index 2c788f0..baf51a7 100644 --- a/tests/functional/030.out +++ b/tests/functional/030.out @@ -37,6 +37,7 @@ s test22 10 MB 12 MB 0.0 MB DATE fd3816 3 22 test20 10 MB 0.0 MB 12 MB DATE fd3817 322 Cluster status: running, auto-recovery enabled Cluster store: plain with 6 redundancy policy +Cluster vnodes strategy: auto Cluster vnode mode: node Cluster created at DATE diff --git a/tests/functional/096.out b/tests/functional/096.out index 2ff9dc6..a555287 100644 --- a/tests/functional/096.out +++ b/tests/functional/096.out @@ -27,6 +27,7 @@ $ ../../dog/dog cluster format -c 3 $ ../../dog/dog cluster info -v Cluster status: running, auto-recovery enabled Cluster store: plain with 3 redundancy policy +Cluster vnodes strategy: auto Cluster vnode mode: node Cluster created at DATE @@ -80,6 +81,7 @@ The cluster's redundancy level is set to 2, the old one was 3. $ ../../dog/dog cluster info -v Cluster status: running, auto-recovery enabled Cluster store: plain with 2 redundancy policy +Cluster vnodes strategy: auto Cluster vnode mode: node Cluster created at DATE -- 2.1.0 -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote: At Tue, 10 Feb 2015 18:35:58 +0800, Liu Yuan wrote: On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote: (2015/02/10 17:58), Liu Yuan wrote: On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote: (2015/02/10 12:10), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 #define SD_FLAG_CMD_WRITE0x01 #define SD_FLAG_CMD_COW 0x02 @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq { uint32_t base_vdi_id; uint8_t copies; uint8_t copy_policy; -uint8_t reserved[2]; +uint8_t store_policy; +uint8_t block_size_shift; uint32_t snapid; uint32_t type; uint32_t pad[2]; @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp { uint32_t pad[5]; } SheepdogVdiRsp; +typedef struct SheepdogClusterRsp { +uint8_t proto_ver; +uint8_t opcode; +uint16_t flags; +uint32_t epoch; +uint32_t id; +uint32_t data_length; +uint32_t result; +uint8_t nr_copies; +uint8_t copy_policy; +uint8_t block_size_shift; +uint8_t __pad1; +uint32_t __pad2[6]; +} SheepdogClusterRsp; + typedef struct SheepdogInode { char name[SD_MAX_VDI_LEN]; char tag[SD_MAX_VDI_TAG_LEN]; @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, hdr.vdi_size = s-inode.vdi_size; hdr.copy_policy = s-inode.copy_policy; hdr.copies = s-inode.nr_copies; +hdr.block_size_shift = s-inode.block_size_shift; ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, rlen); @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, static int sd_prealloc(const char *filename, Error **errp) { BlockDriverState *bs = NULL; +BDRVSheepdogState *base = NULL; +unsigned long buf_size; uint32_t idx, max_idx; +uint32_t object_size; int64_t vdi_size; -void *buf = g_malloc0(SD_DATA_OBJ_SIZE); +void *buf = NULL; int ret; ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, Error **errp) ret = vdi_size; goto out; } -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE); + +base = bs-opaque; +object_size = (UINT32_C(1) base-inode.block_size_shift); +buf_size = MIN(object_size, SD_DATA_OBJ_SIZE); +buf = g_malloc0(buf_size); + +max_idx = DIV_ROUND_UP(vdi_size, buf_size); for (idx = 0; idx max_idx; idx++) { /* * The created image can be a cloned image, so we need to read * a data from the source image. */ -ret = bdrv_pread(bs, idx
Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots
At Thu, 12 Feb 2015 14:38:37 +0800, Liu Yuan wrote: On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote: Current dog vdi snapshot command creates a new snapshot unconditionally, even if a working VDI doesn't have its own objects. In such a case, the created snapshot is redundant because same VDI is already existing. What kind of use case will create two identical snapshots? This logic is simple and code is clean, but I doubt if there is real users of this option. Generally speaking, taking snapshot periodically is an ordinal usecase of enterprise SAN. Of course sheepdog can support this use case. In a case of sheepdog, making cron job (e.g. daily) which invokes dog vdi snapshot simply enables it. But if a VDI doesn't have COWed objects, the snapshot will be redundant. So I want to add this option. Of course, vdi list will provide information about the COWed object (used field). But vdi list is heavy operation in a cluster which has many VDIs because it issues bunch of read requests for inode headers. So avoiding vdi listing as much as possible is fine. Thanks, Hitoshi This patch adds a new option -R to the dog command for reducing the identical snapshots. Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- dog/vdi.c | 48 +++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/dog/vdi.c b/dog/vdi.c index 8e612af..ee465c2 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -40,6 +40,8 @@ static struct sd_option vdi_options[] = { neither comparing nor repairing}, {'z', block_size_shift, true, specify the bit shift num for data object size}, + {'R', reduce-identical-snapshots, false, do not create snapshot if +working VDI doesn't have its own objects}, { 0, NULL, false, NULL }, }; @@ -61,6 +63,7 @@ static struct vdi_cmd_data { uint64_t oid; bool no_share; bool exist; + bool reduce_identical_snapshots; } vdi_cmd_data = { ~0, }; struct get_vdi_info { @@ -605,6 +608,31 @@ fail: return NULL; } +static bool has_own_objects(uint32_t vid, int *ret) Traditionally, we'll have functions return SD_RES_xxx because in this way we could propragate the ret to upper callers. So it is better to have has_own_objects return SD_RES_xxx for consistency. Thanks, Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots
On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote: At Thu, 12 Feb 2015 14:38:37 +0800, Liu Yuan wrote: On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote: Current dog vdi snapshot command creates a new snapshot unconditionally, even if a working VDI doesn't have its own objects. In such a case, the created snapshot is redundant because same VDI is already existing. What kind of use case will create two identical snapshots? This logic is simple and code is clean, but I doubt if there is real users of this option. Generally speaking, taking snapshot periodically is an ordinal usecase of enterprise SAN. Of course sheepdog can support this use case. In a case of sheepdog, making cron job (e.g. daily) which invokes dog vdi snapshot simply enables it. But if a VDI doesn't have COWed objects, the snapshot will be redundant. So I want to add this option. Okay, your patch makes sense for periodic snapshot. But if dog have found identical snapshots, it won't create a new one and return success to the caller. I assume the caller is some middleware, if there is no new vdi returned, will this cause trouble for it? This means it will need to call 'vdi list' to check if new vdi created or not? I'm not agasint this patch, but I have some questions. For identical snapshots, the overhead is just an inode object created, no? Looks to me the overhead is quite small and no need a special option to remove it. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH] sheepdog:show more detail of the crash source
At Wed, 11 Feb 2015 21:47:41 -0800, Wang Zhengyong wrote: In the current sheepdog, when sheepdog crashed, there is too little information about the signal source. This patch use (*handler)(int, siginfo_t *, void *) instead of (*handler)(int). In this way, can show more detail of the crash problem, especially the pid of singal sender Cc: Hitoshi Mitake mitake.hito...@gmail.com Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com --- dog/dog.c |2 +- include/util.h |4 ++-- lib/logger.c|4 ++-- lib/util.c | 11 +++ sheep/sheep.c |9 + shepherd/shepherd.c |2 +- 6 files changed, 18 insertions(+), 14 deletions(-) diff --git a/dog/dog.c b/dog/dog.c index 54520dd..77aa27b 100644 --- a/dog/dog.c +++ b/dog/dog.c @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char *opts) return sd_opts; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { sd_err(dog exits unexpectedly (%s)., strsignal(signo)); diff --git a/include/util.h b/include/util.h index 6a513e0..3c34b40 100644 --- a/include/util.h +++ b/include/util.h @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path); int purge_directory(const char *dir_path); bool is_numeric(const char *p); const char *data_to_str(void *data, size_t data_length); -int install_sighandler(int signum, void (*handler)(int), bool once); -int install_crash_handler(void (*handler)(int)); +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once); +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)); void reraise_crash_signal(int signo, int status); pid_t gettid(void); int tkill(int tid, int sig); diff --git a/lib/logger.c b/lib/logger.c index 02bab00..da0ebac 100644 --- a/lib/logger.c +++ b/lib/logger.c @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo) return signo == SIGHUP; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { if (is_sheep_dead(signo)) sd_err(sheep pid %d exited unexpectedly., sheep_pid); @@ -552,7 +552,7 @@ static void crash_handler(int signo) reraise_crash_signal(signo, 1); } -static void sighup_handler(int signo) +static void sighup_handler(int signo, siginfo_t *info, void *context) { rotate_log(); } diff --git a/lib/util.c b/lib/util.c index 21e0143..089455d 100644 --- a/lib/util.c +++ b/lib/util.c @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length) * If 'once' is true, the signal will be restored to the default state * after 'handler' is called. */ -int install_sighandler(int signum, void (*handler)(int), bool once) +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), bool once) { struct sigaction sa = {}; sa.sa_handler = handler; The handler is now a function which should be called via sa.sa_sigaction. You need to assign handler to the sa_sigaction. Other part looks good to me. Thanks, Hitoshi + sa.sa_flags = SA_SIGINFO; + if (once) - sa.sa_flags = SA_RESETHAND | SA_NODEFER; + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER; sigemptyset(sa.sa_mask); return sigaction(signum, sa, NULL); } -int install_crash_handler(void (*handler)(int)) +int install_crash_handler(void (*handler)(int, siginfo_t *, void *)) { return install_sighandler(SIGSEGV, handler, true) || install_sighandler(SIGABRT, handler, true) || install_sighandler(SIGBUS, handler, true) || install_sighandler(SIGILL, handler, true) || - install_sighandler(SIGFPE, handler, true); + install_sighandler(SIGFPE, handler, true) || + install_sighandler(SIGQUIT, handler, true); } /* diff --git a/sheep/sheep.c b/sheep/sheep.c index e0a034f..6c540ae 100644 --- a/sheep/sheep.c +++ b/sheep/sheep.c @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void *data) ret = read(sigfd, siginfo, sizeof(siginfo)); assert(ret == sizeof(siginfo)); - sd_debug(signal %d, siginfo.ssi_signo); + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid); switch (siginfo.ssi_signo) { case SIGTERM: sys-cinfo.status = SD_STATUS_KILLED; @@ -276,9 +276,10 @@ static int init_signal(void) return 0; } -static void crash_handler(int signo) +static void crash_handler(int signo, siginfo_t *info, void *context) { - sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo)); + sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, code %d, + strsignal(signo), info-si_pid, info-si_uid, info-si_errno, info-si_code);
[sheepdog] [PATCH] zookeeper: add more detailed description on how zk_watcher report states
From: Liu Yuan liuy...@cmss.chinamobile.com Signed-off-by: Liu Yuan liuy...@cmss.chinamobile.com --- sheep/cluster/zookeeper.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c index b603d36..e2ee248 100644 --- a/sheep/cluster/zookeeper.c +++ b/sheep/cluster/zookeeper.c @@ -685,6 +685,27 @@ static int add_event(enum zk_event_type type, struct zk_node *znode, void *buf, } } +/* + * Type value: + * -1 SESSION_EVENT, use State to indicate what kind of sub-event + * State value: + * -122 SESSION EXPIRED + * 1CONNECTING + * 3CONNECTED + * 1 CREATED_EVENT + * 2 DELETED_EVENT + * 3 CHANGED_EVENT + * 4 CHILD_EVENT + * + * While connection to zk is disconnected (zk cluster in election, network is + * broken, etc.), zk library will try to reconnect zk cluster on its own and + * report both the connection state and session state to sheep via zk_watcher. + * + * Once the connection is reestablished (state changed from 1 to 3)within the + * timeout window, the session is still valid, meaning that all the watchers + * will function as before. If not within the timeout window, zk_watcher will + * report to sheep that session is expired. + */ static void zk_watcher(zhandle_t *zh, int type, int state, const char *path, void *ctx) { @@ -693,6 +714,8 @@ static void zk_watcher(zhandle_t *zh, int type, int state, const char *path, uint64_t lock_id; int ret; + sd_debug(path:%s, type:%d, state:%d, path, type, state); + if (type == ZOO_SESSION_EVENT state == ZOO_EXPIRED_SESSION_STATE) { /* * do reconnect in main thread to avoid on-the-fly zookeeper @@ -702,8 +725,6 @@ static void zk_watcher(zhandle_t *zh, int type, int state, const char *path, return; } -/* CREATED_EVENT 1, DELETED_EVENT 2, CHANGED_EVENT 3, CHILD_EVENT 4 */ - sd_debug(path:%s, type:%d, state:%d, path, type, state); if (type == ZOO_CREATED_EVENT || type == ZOO_CHANGED_EVENT) { ret = sscanf(path, MEMBER_ZNODE /%s, str); if (ret == 1) -- 1.9.1 -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] obbject placement
On Sun, Feb 08, 2015 at 08:46:01PM +0100, Corin Langosch wrote: Hi guys, afaik sheepdog uses consistent hashing to map objects to nodes. But how do you choose where the individual ec-chunks of an object should go? Consistent hashing cannot be used here because different ec-chunks of the same object might get mapped to the same node. I looked at the sources and got an idea: you calculate a base node for the object and then for each ec-chunk you add its index. Example: we have 10 nodes (0..9), consistent hashing calculates node 7 for our object. As we have 4:2 encoding, we get the ec-chunks stored on [7,8,9,0,1,2]? Is my assumption correct, or how do you actually do it? Yes, you are right if we don't consider the virtual nodes added to consistent hashing to mitigate the object migration problem. But with virtual nodes in the picture, we can add one more rule to make sure all the data and ec-chunks are on the different physical nodes. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote: Hi! I need to store user backups and allows to download it. I see in google that sheepdog supports deduplication, but can't find info in sheepdog docs about it. Does sheepdog support deduplication? This deduplication is for SD's internal use, to store its own cluster snapshot. Also i think not use cluster wide snapshots because i want to dedicate backup server from other sheepdog nodes, so i need to copy user vdi from compute node to backup node. Does somebody can say how can i do that in optimal way? Thanks! How about sheepdog's RESTful storage to store user's backup and downloading? More detail see https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] effective storing backups and deduplication
Hi! I need to store user backups and allows to download it. I see in google that sheepdog supports deduplication, but can't find info in sheepdog docs about it. Does sheepdog support deduplication? Also i think not use cluster wide snapshots because i want to dedicate backup server from other sheepdog nodes, so i need to copy user vdi from compute node to backup node. Does somebody can say how can i do that in optimal way? Thanks! -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
2015-02-11 15:08 GMT+03:00 Liu Yuan namei.u...@gmail.com: On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote: Hi! I need to store user backups and allows to download it. I see in google that sheepdog supports deduplication, but can't find info in sheepdog docs about it. Does sheepdog support deduplication? This deduplication is for SD's internal use, to store its own cluster snapshot. So, if i have nearly identicl backups (for example 5Gb of data each and only 1Gb is different) space needed for two backups equal 10Gb? How much work needed for vdi deduplication? Also i think not use cluster wide snapshots because i want to dedicate backup server from other sheepdog nodes, so i need to copy user vdi from compute node to backup node. Does somebody can say how can i do that in optimal way? Thanks! How about sheepdog's RESTful storage to store user's backup and downloading? More detail see https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage Thanks Yes i think about it and now i'm try to understand how to add authentication and other needed stuff. -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
2015-02-11 15:28 GMT+03:00 Liu Yuan namei.u...@gmail.com: We need to what is user's backups. Is it the whole vdi or dalta data for different vdis? Best scheme as i think is: 1) If backup not exists for vdi - create full backup (this is simple copy all data) 2) If backup already created - create new backup and copy only delta from previous backup. 3) If use delete old backup - remove garbage pieces that not belongs to other vdi. 4) In case of steps from 1 to 2 - check other vdi pieces for duplicate data and store only difference. But i think this is very problematic in this case. -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
On Wed, Feb 11, 2015 at 04:32:34PM +0400, Vasiliy Tolstov wrote: 2015-02-11 15:28 GMT+03:00 Liu Yuan namei.u...@gmail.com: We need to what is user's backups. Is it the whole vdi or dalta data for different vdis? Best scheme as i think is: 1) If backup not exists for vdi - create full backup (this is simple copy all data) 2) If backup already created - create new backup and copy only delta from previous backup. 3) If use delete old backup - remove garbage pieces that not belongs to other vdi. 4) In case of steps from 1 to 2 - check other vdi pieces for duplicate data and store only difference. But i think this is very problematic in this case. This scheme can build on the sheepdog's current features: 0 use qemu-img (recommenced because better performance) or dog to read the base vdi. 1 use dog to backup the delta data for different snapshots takben by qemu-img snapshot or dog vdi snapshot. 2 manage the delta data and the base for the user defined snapshots relations by the upper layer 3 use SD http storage to store the base and delta data. I guess you need something as the middle layer to map the user defined snapshots to sheepdog's base and delta data and implement gc in this middle layer. Authentication would be better implementated in this middleware. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] effective storing backups and deduplication
On Wed, Feb 11, 2015 at 04:14:35PM +0400, Vasiliy Tolstov wrote: 2015-02-11 15:08 GMT+03:00 Liu Yuan namei.u...@gmail.com: On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote: Hi! I need to store user backups and allows to download it. I see in google that sheepdog supports deduplication, but can't find info in sheepdog docs about it. Does sheepdog support deduplication? This deduplication is for SD's internal use, to store its own cluster snapshot. So, if i have nearly identicl backups (for example 5Gb of data each and only 1Gb is different) space needed for two backups equal 10Gb? How much work needed for vdi deduplication? We need to what is user's backups. Is it the whole vdi or dalta data for different vdis? Cluster snapshot will snapshot the whole cluster and store it in a deduplicated way, I don't think it is what you need. Also i think not use cluster wide snapshots because i want to dedicate backup server from other sheepdog nodes, so i need to copy user vdi from compute node to backup node. Does somebody can say how can i do that in optimal way? Thanks! How about sheepdog's RESTful storage to store user's backup and downloading? More detail see https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage Thanks Yes i think about it and now i'm try to understand how to add authentication and other needed stuff. You can reference the openstack's swift implementation. But feel free to choose what you think reasonale for authentication implementation for sheepdog. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support
At Tue, 10 Feb 2015 18:35:58 +0800, Liu Yuan wrote: On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote: (2015/02/10 17:58), Liu Yuan wrote: On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote: (2015/02/10 12:10), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- V4: - Limit a read/write buffer size for creating a preallocated VDI. - Replace a parse function for the block_size_shift option. - Fix an error message. V3: - Delete the needless operation of buffer. - Delete the needless operations of request header. for SD_OP_GET_CLUSTER_DEFAULT. - Fix coding style problems. V2: - Fix coding style problem (white space). - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. - Initialize request header to use block_size_shift specified by user. --- block/sheepdog.c | 138 ++--- include/block/block_int.h |1 + 2 files changed, 119 insertions(+), 20 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index be3176f..a43b947 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -37,6 +37,7 @@ #define SD_OP_READ_VDIS 0x15 #define SD_OP_FLUSH_VDI 0x16 #define SD_OP_DEL_VDI0x17 +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 #define SD_FLAG_CMD_WRITE0x01 #define SD_FLAG_CMD_COW 0x02 @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq { uint32_t base_vdi_id; uint8_t copies; uint8_t copy_policy; -uint8_t reserved[2]; +uint8_t store_policy; +uint8_t block_size_shift; uint32_t snapid; uint32_t type; uint32_t pad[2]; @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp { uint32_t pad[5]; } SheepdogVdiRsp; +typedef struct SheepdogClusterRsp { +uint8_t proto_ver; +uint8_t opcode; +uint16_t flags; +uint32_t epoch; +uint32_t id; +uint32_t data_length; +uint32_t result; +uint8_t nr_copies; +uint8_t copy_policy; +uint8_t block_size_shift; +uint8_t __pad1; +uint32_t __pad2[6]; +} SheepdogClusterRsp; + typedef struct SheepdogInode { char name[SD_MAX_VDI_LEN]; char tag[SD_MAX_VDI_TAG_LEN]; @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, hdr.vdi_size = s-inode.vdi_size; hdr.copy_policy = s-inode.copy_policy; hdr.copies = s-inode.nr_copies; +hdr.block_size_shift = s-inode.block_size_shift; ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, rlen); @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot, static int sd_prealloc(const char *filename, Error **errp) { BlockDriverState *bs = NULL; +BDRVSheepdogState *base = NULL; +unsigned long buf_size; uint32_t idx, max_idx; +uint32_t object_size; int64_t vdi_size; -void *buf = g_malloc0(SD_DATA_OBJ_SIZE); +void *buf = NULL; int ret; ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, Error **errp) ret = vdi_size; goto out; } -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE); + +base = bs-opaque; +object_size = (UINT32_C(1) base-inode.block_size_shift); +buf_size = MIN(object_size, SD_DATA_OBJ_SIZE); +buf = g_malloc0(buf_size); + +max_idx = DIV_ROUND_UP(vdi_size, buf_size); for (idx = 0; idx max_idx; idx++) { /* * The created image can be a cloned image, so we need to read * a data from the source image. */ -ret = bdrv_pread(bs, idx * SD_DATA_OBJ_SIZE, buf, SD_DATA_OBJ_SIZE); +ret = bdrv_pread(bs, idx * buf_size, buf, buf_size); if (ret 0) { goto out; } -ret = bdrv_pwrite(bs, idx *