Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Vasiliy Tolstov
2015-02-11 15:43 GMT+03:00 Liu Yuan namei.u...@gmail.com:
 This scheme can build on the sheepdog's current features:

 0 use qemu-img (recommenced because better performance) or dog to read the 
 base
   vdi.

 1 use dog to backup the delta data for different snapshots takben by
   qemu-img snapshot or dog vdi snapshot.

 2 manage the delta data and the base for the user defined snapshots relations
   by the upper layer

 3 use SD http storage to store the base and delta data.

 I guess you need something as the middle layer to map the user defined 
 snapshots
 to sheepdog's base and delta data and implement gc in this middle layer.
 Authentication would be better implementated in this middleware.


Nice =)! So questions is:
0 - i need to run qemu-img snapshot sheepdog:base.img ?
1 - can you provide cmdline for dog to understand what i need

Big Thanks!

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] Build failed in Jenkins: sheepdog-build #631

2015-02-11 Thread sheepdog-jenkins
See http://jenkins.sheepdog-project.org:8080/job/sheepdog-build/631/changes

Changes:

[liuyuan] dog: type cast miss at vdi_show_progress

--
[...truncated 57 lines...]
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for size_t... yes
checking for working alloca.h... yes
checking for alloca... yes
checking for dirent.h that defines DIR... yes
checking for library containing opendir... none required
checking for ANSI C header files... (cached) yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking arpa/inet.h usability... yes
checking arpa/inet.h presence... yes
checking for arpa/inet.h... yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking netdb.h usability... yes
checking netdb.h presence... yes
checking for netdb.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking sys/ioctl.h usability... yes
checking sys/ioctl.h presence... yes
checking for sys/ioctl.h... yes
checking sys/param.h usability... yes
checking sys/param.h presence... yes
checking for sys/param.h... yes
checking sys/socket.h usability... yes
checking sys/socket.h presence... yes
checking for sys/socket.h... yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking syslog.h usability... yes
checking syslog.h presence... yes
checking for syslog.h... yes
checking for unistd.h... (cached) yes
checking for sys/types.h... (cached) yes
checking getopt.h usability... yes
checking getopt.h presence... yes
checking for getopt.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking sys/sockio.h usability... no
checking sys/sockio.h presence... no
checking for sys/sockio.h... no
checking utmpx.h usability... yes
checking utmpx.h presence... yes
checking for utmpx.h... yes
checking urcu.h usability... yes
checking urcu.h presence... yes
checking for urcu.h... yes
checking urcu/uatomic.h usability... yes
checking urcu/uatomic.h presence... yes
checking for urcu/uatomic.h... yes
checking for an ANSI C-conforming const... yes
checking for uid_t in sys/types.h... yes
checking for inline... inline
checking for size_t... (cached) yes
checking whether time.h and sys/time.h may both be included... yes
checking for working volatile... yes
checking size of short... 2
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking sys/eventfd.h usability... yes
checking sys/eventfd.h presence... yes
checking for sys/eventfd.h... yes
checking sys/signalfd.h usability... yes
checking sys/signalfd.h presence... yes
checking for sys/signalfd.h... yes
checking sys/timerfd.h usability... yes
checking sys/timerfd.h presence... yes
checking for sys/timerfd.h... yes
checking whether closedir returns void... no
checking for error_at_line... yes
checking for mbstate_t... yes
checking for working POSIX fnmatch... yes
checking for pid_t... yes
checking vfork.h usability... no
checking vfork.h presence... no
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... yes
checking for working vfork... (cached) yes
checking whether gcc needs -traditional... no
checking for stdlib.h... (cached) yes
checking for GNU libc compatible malloc... yes
checking for working memcmp... yes
checking for stdlib.h... (cached) yes
checking for GNU libc compatible realloc... yes
checking sys/select.h usability... yes
checking sys/select.h presence... yes
checking for sys/select.h... yes
checking for sys/socket.h... (cached) yes
checking types of arguments for select... int,fd_set *,struct timeval *
checking return type of signal handlers... void
checking for vprintf... yes
checking for _doprnt... no
checking for alarm... yes
checking for alphasort... yes
checking for atexit... yes
checking for bzero... yes
checking for dup2... yes
checking for endgrent... yes
checking for endpwent... yes
checking for fcntl... yes
checking for getcwd... yes
checking for getpeerucred... no
checking for getpeereid... no
checking for gettimeofday... yes
checking for inet_ntoa... yes
checking for memmove... yes
checking for memset... yes
checking for mkdir... yes
checking for scandir... yes
checking for select... yes
checking for socket... yes
checking for strcasecmp... yes

Re: [sheepdog] [PATCH] dog: fix to calculate a resizable max VDI size appropriately

2015-02-11 Thread Liu Yuan
On Tue, Feb 10, 2015 at 05:53:44PM +0900, Teruaki Ishizaki wrote:
 A resizable max VDI size was fixed value, 4TB.
 
 So, when block_size_shift was specified more than 22,
 resizing VDI size over 4TB caused error.
 
 This patch enables to calculate a resizable max VDI properly.
 
 Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
 ---
  dog/vdi.c |   24 +++-
  1 files changed, 15 insertions(+), 9 deletions(-)
 
 diff --git a/dog/vdi.c b/dog/vdi.c
 index 6cb813e..8e5ab13 100644
 --- a/dog/vdi.c
 +++ b/dog/vdi.c
 @@ -845,8 +845,8 @@ out:
  static int vdi_resize(int argc, char **argv)
  {
   const char *vdiname = argv[optind++];
 - uint64_t new_size;
 - uint32_t vid;
 + uint64_t new_size, old_max_total_size;
 + uint32_t vid, object_size;
   int ret;
   char buf[SD_INODE_HEADER_SIZE];
   struct sd_inode *inode = (struct sd_inode *)buf;
 @@ -863,13 +863,19 @@ static int vdi_resize(int argc, char **argv)
   if (ret != EXIT_SUCCESS)
   return ret;
  
 - if (new_size  SD_OLD_MAX_VDI_SIZE  0 == inode-store_policy) {
 - sd_err(New VDI size is too large);
 - return EXIT_USAGE;
 - }
 -
 - if (new_size  SD_MAX_VDI_SIZE) {
 - sd_err(New VDI size is too large);
 + object_size = (UINT32_C(1)  inode-block_size_shift);
 + old_max_total_size = object_size * OLD_MAX_DATA_OBJS;
 + if (0 == inode-store_policy) {
 + if (new_size  old_max_total_size) {
 + sd_err(New VDI size is too large.
 + This volume's max size is %PRIu64,
 +old_max_total_size);
 + return EXIT_USAGE;
 + }
 + } else if (new_size  SD_MAX_VDI_SIZE) {
 + sd_err(New VDI size is too large
 +  This volume's max size is %llu,
 + SD_MAX_VDI_SIZE);
   return EXIT_USAGE;
   }
  

Applied thanks. BTW, we don't reach how we make use this additional field yet.
I don't think we should expose block_size_shift to plain users. For users,
object size is much more direct and simple to understand. There is no reason
we can't expose object size as option to users, no?

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Liu Yuan
On Wed, Feb 11, 2015 at 05:57:25PM +0400, Vasiliy Tolstov wrote:
 2015-02-11 15:43 GMT+03:00 Liu Yuan namei.u...@gmail.com:
  This scheme can build on the sheepdog's current features:
 
  0 use qemu-img (recommenced because better performance) or dog to read the 
  base
vdi.
 
  1 use dog to backup the delta data for different snapshots takben by
qemu-img snapshot or dog vdi snapshot.
 
  2 manage the delta data and the base for the user defined snapshots 
  relations
by the upper layer
 
  3 use SD http storage to store the base and delta data.
 
  I guess you need something as the middle layer to map the user defined 
  snapshots
  to sheepdog's base and delta data and implement gc in this middle layer.
  Authentication would be better implementated in this middleware.
 
 
 Nice =)! So questions is:
 0 - i need to run qemu-img snapshot sheepdog:base.img ?

qemu-img snapshot sheepdog:your_vdi will snapshot this vdi and switch to the new
vdi, the old vdi will be marked as snapshot.

 1 - can you provide cmdline for dog to understand what i need

See this https://github.com/sheepdog/sheepdog/wiki/Image-Backup

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote:
 (2015/02/10 20:12), Liu Yuan wrote:
 On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
 Previously, qemu block driver of sheepdog used hard-coded VDI object size.
 This patch enables users to handle block_size_shift value for
 calculating VDI object size.
 
 When you start qemu, you don't need to specify additional command option.
 
 But when you create the VDI which doesn't have default object size
 with qemu-img command, you specify block_size_shift option.
 
 If you want to create a VDI of 8MB(1  23) object size,
 you need to specify following command option.
 
   # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
 
 In addition, when you don't specify qemu-img command option,
 a default value of sheepdog cluster is used for creating VDI.
 
   # qemu-img create sheepdog:test2 100M
 
 Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
 ---
 V4:
   - Limit a read/write buffer size for creating a preallocated VDI.
   - Replace a parse function for the block_size_shift option.
   - Fix an error message.
 
 V3:
   - Delete the needless operation of buffer.
   - Delete the needless operations of request header.
 for SD_OP_GET_CLUSTER_DEFAULT.
   - Fix coding style problems.
 
 V2:
   - Fix coding style problem (white space).
   - Add members, store_policy and block_size_shift to struct SheepdogVdiReq.
   - Initialize request header to use block_size_shift specified by user.
 ---
   block/sheepdog.c  |  138 
  ++---
   include/block/block_int.h |1 +
   2 files changed, 119 insertions(+), 20 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index be3176f..a43b947 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -37,6 +37,7 @@
   #define SD_OP_READ_VDIS  0x15
   #define SD_OP_FLUSH_VDI  0x16
   #define SD_OP_DEL_VDI0x17
 +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
 
 This might not be necessary. For old qemu or the qemu-img without setting
 option, the block_size_shift will be 0.
 
 If we make 0 to represent 4MB object, then we don't need to get the default
 cluster object size.
 
 We migth even get rid of the idea of cluster default size. The downsize is 
 that,
 if we want to create a vdi with different size not the default 4MB,
 we have to write it every time for qemu-img or dog.
 
 If we choose to keep the idea of cluster default size, I think we'd also try 
 to
 avoid call this request from QEMU to make backward compatibility easier. In 
 this
 scenario, 0 might be used to ask new sheep to decide to use cluster default 
 size.
 
 Both old qemu and new QEMU will send 0 to sheep and both old and new sheep 
 can
 handle 0 though it has different meanings.
 
 Table for this bit as 0:
 Qe: qemu
 SD: Sheep daemon
 CDS: Cluster Default Size
 Ign: Ignored by the sheep daemon
 
 Qe/sd   newold
 new CDSIgn
 old CDSNULL
 Does Ign mean that VDI is handled as 4MB object size?

Yes, old sheep can only handle 4MB object and doesn't check this field at all.

 
 
 I think this approach is acceptable. The difference to your patch is that
 we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and
 SD_OP_GET_CLUSTER_DEFAULT can be removed.
 When users create a new VDI with qemu-img, qemu's Sheepdog backend
 driver calculates max limit VDI size.

 But if block_size_shift option is not specified, qemu's Sheepdog backend
 driver can't calculate max limit VDI size.

If block_size_shift not specified, this means

1 for old sheep, use 4MB size
2 for new sheep, use cluster wide default value.

And sheep then can calculate it on its own, no?

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 11:33:16AM +0900, Teruaki Ishizaki wrote:
 (2015/02/12 11:19), Liu Yuan wrote:
 On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote:
 (2015/02/10 20:12), Liu Yuan wrote:
 On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
 Previously, qemu block driver of sheepdog used hard-coded VDI object size.
 This patch enables users to handle block_size_shift value for
 calculating VDI object size.
 
 When you start qemu, you don't need to specify additional command option.
 
 But when you create the VDI which doesn't have default object size
 with qemu-img command, you specify block_size_shift option.
 
 If you want to create a VDI of 8MB(1  23) object size,
 you need to specify following command option.
 
   # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
 
 In addition, when you don't specify qemu-img command option,
 a default value of sheepdog cluster is used for creating VDI.
 
   # qemu-img create sheepdog:test2 100M
 
 Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
 ---
 V4:
   - Limit a read/write buffer size for creating a preallocated VDI.
   - Replace a parse function for the block_size_shift option.
   - Fix an error message.
 
 V3:
   - Delete the needless operation of buffer.
   - Delete the needless operations of request header.
 for SD_OP_GET_CLUSTER_DEFAULT.
   - Fix coding style problems.
 
 V2:
   - Fix coding style problem (white space).
   - Add members, store_policy and block_size_shift to struct 
  SheepdogVdiReq.
   - Initialize request header to use block_size_shift specified by user.
 ---
   block/sheepdog.c  |  138 
  ++---
   include/block/block_int.h |1 +
   2 files changed, 119 insertions(+), 20 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index be3176f..a43b947 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -37,6 +37,7 @@
   #define SD_OP_READ_VDIS  0x15
   #define SD_OP_FLUSH_VDI  0x16
   #define SD_OP_DEL_VDI0x17
 +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
 
 This might not be necessary. For old qemu or the qemu-img without setting
 option, the block_size_shift will be 0.
 
 If we make 0 to represent 4MB object, then we don't need to get the default
 cluster object size.
 
 We migth even get rid of the idea of cluster default size. The downsize is 
 that,
 if we want to create a vdi with different size not the default 4MB,
 we have to write it every time for qemu-img or dog.
 
 If we choose to keep the idea of cluster default size, I think we'd also 
 try to
 avoid call this request from QEMU to make backward compatibility easier. 
 In this
 scenario, 0 might be used to ask new sheep to decide to use cluster 
 default size.
 
 Both old qemu and new QEMU will send 0 to sheep and both old and new sheep 
 can
 handle 0 though it has different meanings.
 
 Table for this bit as 0:
 Qe: qemu
 SD: Sheep daemon
 CDS: Cluster Default Size
 Ign: Ignored by the sheep daemon
 
 Qe/sd   newold
 new CDSIgn
 old CDSNULL
 Does Ign mean that VDI is handled as 4MB object size?
 
 Yes, old sheep can only handle 4MB object and doesn't check this field at 
 all.
 
 
 
 I think this approach is acceptable. The difference to your patch is that
 we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and
 SD_OP_GET_CLUSTER_DEFAULT can be removed.
 When users create a new VDI with qemu-img, qemu's Sheepdog backend
 driver calculates max limit VDI size.
 
 But if block_size_shift option is not specified, qemu's Sheepdog backend
 driver can't calculate max limit VDI size.
 
 If block_size_shift not specified, this means
 
 1 for old sheep, use 4MB size
 2 for new sheep, use cluster wide default value.
 
 And sheep then can calculate it on its own, no?
 
 Dog command(client) calculate max size, so I think
 that qemu's Sheepdog backend driver should calculate it
 like dog command.
 
 Is that policy changeable?

I checked the QEMU code and got your idea. In the past it was fixed size so very
easy to hardcode the check in the client, no communication with sheep needed.

Yes, if it is reasonable, we can change it.

I think we can push the size calculation logic into sheep, if not the right size
return INVALID_PARAMETER to clients. Clients just check this and report error
back to users.

There is no backward compability for this approach, since 4MB is the smallest
size.

OLD QEMU will limit the max_size as 4TB, which is no problem for new sheep.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] who is in charge of Jenkins-sheepdog

2015-02-11 Thread Liu Yuan
Hi all,

   who has the right permission to install yasm on the sheepdog Jenkins server?
Our list is annoyed everyday, please save us from it.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Teruaki Ishizaki

(2015/02/10 20:12), Liu Yuan wrote:

On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:

Previously, qemu block driver of sheepdog used hard-coded VDI object size.
This patch enables users to handle block_size_shift value for
calculating VDI object size.

When you start qemu, you don't need to specify additional command option.

But when you create the VDI which doesn't have default object size
with qemu-img command, you specify block_size_shift option.

If you want to create a VDI of 8MB(1  23) object size,
you need to specify following command option.

  # qemu-img create -o block_size_shift=23 sheepdog:test1 100M

In addition, when you don't specify qemu-img command option,
a default value of sheepdog cluster is used for creating VDI.

  # qemu-img create sheepdog:test2 100M

Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
---
V4:
  - Limit a read/write buffer size for creating a preallocated VDI.
  - Replace a parse function for the block_size_shift option.
  - Fix an error message.

V3:
  - Delete the needless operation of buffer.
  - Delete the needless operations of request header.
for SD_OP_GET_CLUSTER_DEFAULT.
  - Fix coding style problems.

V2:
  - Fix coding style problem (white space).
  - Add members, store_policy and block_size_shift to struct SheepdogVdiReq.
  - Initialize request header to use block_size_shift specified by user.
---
  block/sheepdog.c  |  138 ++---
  include/block/block_int.h |1 +
  2 files changed, 119 insertions(+), 20 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index be3176f..a43b947 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -37,6 +37,7 @@
  #define SD_OP_READ_VDIS  0x15
  #define SD_OP_FLUSH_VDI  0x16
  #define SD_OP_DEL_VDI0x17
+#define SD_OP_GET_CLUSTER_DEFAULT   0x18


This might not be necessary. For old qemu or the qemu-img without setting
option, the block_size_shift will be 0.

If we make 0 to represent 4MB object, then we don't need to get the default
cluster object size.

We migth even get rid of the idea of cluster default size. The downsize is that,
if we want to create a vdi with different size not the default 4MB,
we have to write it every time for qemu-img or dog.

If we choose to keep the idea of cluster default size, I think we'd also try to
avoid call this request from QEMU to make backward compatibility easier. In this
scenario, 0 might be used to ask new sheep to decide to use cluster default 
size.

Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can
handle 0 though it has different meanings.

Table for this bit as 0:
Qe: qemu
SD: Sheep daemon
CDS: Cluster Default Size
Ign: Ignored by the sheep daemon

Qe/sd   newold
new CDSIgn
old CDSNULL

Does Ign mean that VDI is handled as 4MB object size?



I think this approach is acceptable. The difference to your patch is that
we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and
SD_OP_GET_CLUSTER_DEFAULT can be removed.

When users create a new VDI with qemu-img, qemu's Sheepdog backend
driver calculates max limit VDI size.
But if block_size_shift option is not specified, qemu's Sheepdog backend
driver can't calculate max limit VDI size.

So, I think that qemu's Sheepdog backend driver must get cluster default
value from sheep daemon.

Thanks,
Teruaki


--
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Teruaki Ishizaki

(2015/02/12 11:19), Liu Yuan wrote:

On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote:

(2015/02/10 20:12), Liu Yuan wrote:

On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:

Previously, qemu block driver of sheepdog used hard-coded VDI object size.
This patch enables users to handle block_size_shift value for
calculating VDI object size.

When you start qemu, you don't need to specify additional command option.

But when you create the VDI which doesn't have default object size
with qemu-img command, you specify block_size_shift option.

If you want to create a VDI of 8MB(1  23) object size,
you need to specify following command option.

  # qemu-img create -o block_size_shift=23 sheepdog:test1 100M

In addition, when you don't specify qemu-img command option,
a default value of sheepdog cluster is used for creating VDI.

  # qemu-img create sheepdog:test2 100M

Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
---
V4:
  - Limit a read/write buffer size for creating a preallocated VDI.
  - Replace a parse function for the block_size_shift option.
  - Fix an error message.

V3:
  - Delete the needless operation of buffer.
  - Delete the needless operations of request header.
for SD_OP_GET_CLUSTER_DEFAULT.
  - Fix coding style problems.

V2:
  - Fix coding style problem (white space).
  - Add members, store_policy and block_size_shift to struct SheepdogVdiReq.
  - Initialize request header to use block_size_shift specified by user.
---
  block/sheepdog.c  |  138 ++---
  include/block/block_int.h |1 +
  2 files changed, 119 insertions(+), 20 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index be3176f..a43b947 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -37,6 +37,7 @@
  #define SD_OP_READ_VDIS  0x15
  #define SD_OP_FLUSH_VDI  0x16
  #define SD_OP_DEL_VDI0x17
+#define SD_OP_GET_CLUSTER_DEFAULT   0x18


This might not be necessary. For old qemu or the qemu-img without setting
option, the block_size_shift will be 0.

If we make 0 to represent 4MB object, then we don't need to get the default
cluster object size.

We migth even get rid of the idea of cluster default size. The downsize is that,
if we want to create a vdi with different size not the default 4MB,
we have to write it every time for qemu-img or dog.

If we choose to keep the idea of cluster default size, I think we'd also try to
avoid call this request from QEMU to make backward compatibility easier. In this
scenario, 0 might be used to ask new sheep to decide to use cluster default 
size.

Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can
handle 0 though it has different meanings.

Table for this bit as 0:
Qe: qemu
SD: Sheep daemon
CDS: Cluster Default Size
Ign: Ignored by the sheep daemon

Qe/sd   newold
new CDSIgn
old CDSNULL

Does Ign mean that VDI is handled as 4MB object size?


Yes, old sheep can only handle 4MB object and doesn't check this field at all.





I think this approach is acceptable. The difference to your patch is that
we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and
SD_OP_GET_CLUSTER_DEFAULT can be removed.

When users create a new VDI with qemu-img, qemu's Sheepdog backend
driver calculates max limit VDI size.



But if block_size_shift option is not specified, qemu's Sheepdog backend
driver can't calculate max limit VDI size.


If block_size_shift not specified, this means

1 for old sheep, use 4MB size
2 for new sheep, use cluster wide default value.

And sheep then can calculate it on its own, no?


Dog command(client) calculate max size, so I think
that qemu's Sheepdog backend driver should calculate it
like dog command.

Is that policy changeable?
Is there no policy?

Thanks,
Teruaki
--
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 04:40:56PM +0900, Hitoshi Mitake wrote:
 At Thu, 12 Feb 2015 15:31:15 +0800,
 Liu Yuan wrote:
  
  On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote:
   At Thu, 12 Feb 2015 14:38:37 +0800,
   Liu Yuan wrote:

On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote:
 Current dog vdi snapshot command creates a new snapshot
 unconditionally, even if a working VDI doesn't have its own
 objects. In such a case, the created snapshot is redundant because
 same VDI is already existing.

What kind of use case will create two identical snapshots? This logic 
is simple
and code is clean, but I doubt if there is real users of this option.
   
   Generally speaking, taking snapshot periodically is an ordinal usecase
   of enterprise SAN. Of course sheepdog can support this use case. In a
   case of sheepdog, making cron job (e.g. daily) which invokes dog vdi
   snapshot simply enables it.
   
   But if a VDI doesn't have COWed objects, the snapshot will be
   redundant. So I want to add this option.
  
  Okay, your patch makes sense for periodic snapshot. But if dog have found
  identical snapshots, it won't create a new one and return success to the 
  caller.
  
  I assume the caller is some middleware, if there is no new vdi returned, 
  will
  this cause trouble for it? This means it will need to call 'vdi list' to 
  check
  if new vdi created or not? 
 
 So I'm adding this feature with the new option. Existing semantics
 isn't affected. And if checking process (has_own_objects()) faces
 error, it is reported correctly to middleware.
 
  
  I'm not agasint this patch, but I have some questions. For identical 
  snapshots,
  the overhead is just an inode object created, no? Looks to me the overhead 
  is
  quite small and no need a special option to remove it.
 
 Taking snapshots of thousands of VDIs will consume thousands of VID,
 and create thousands * replication factor of inodes. I'm not sure the
 consumption of VID will become serious problem, but inodes will make
 replication time longer (e.g. 16:4 ec requires 20 inodes).

Yes, this is the point. Make sense to me. I have some comments to the code in
my last email. Could you submit a V2? BTW, it would be great if you can include
above rationale into the commit log.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] [PATCH 4/5] tests: fix content of 052.out

2015-02-11 Thread Wang dongxu
Since code is printf(%s\n, sd_strerror(rsp-result));, 052.out should add
a new line.

Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com
---
 tests/functional/052.out | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/functional/052.out b/tests/functional/052.out
index 2a533d5..f4487d0 100644
--- a/tests/functional/052.out
+++ b/tests/functional/052.out
@@ -52,6 +52,7 @@ Failed to read object 807c2b25 Waiting for other 
nodes to join cluster
 Failed to read inode header
   NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
 Cluster status: Waiting for other nodes to join cluster
+
 Failed to read object 807c2b25 Waiting for other nodes to join cluster
 Failed to read inode header
   NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
-- 
2.1.0



-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] [PATCH 0/5] tests: fix some test cases to suitable for new sheepdog and QEMU

2015-02-11 Thread Wang dongxu
QEMU and sheepdog changes some output formats while upgrading to new version, so
tests/functional test cases need some changes.

Wang dongxu (5):
  tests: avoid qemu-io warning
  tests: avoid qemu-img snapshot warning
  tests: correct vdi list
  tests: fix content of 052.out
  tests:fix vnode strategy output

 tests/functional/013 |  8 
 tests/functional/017 | 14 +++---
 tests/functional/024 |  6 +++---
 tests/functional/025 |  4 ++--
 tests/functional/030.out |  1 +
 tests/functional/039 | 22 +++---
 tests/functional/052.out |  1 +
 tests/functional/058 |  2 +-
 tests/functional/059 |  2 +-
 tests/functional/073.out |  2 +-
 tests/functional/075 |  2 +-
 tests/functional/081.out |  6 +++---
 tests/functional/082.out |  6 +++---
 tests/functional/087.out | 10 +-
 tests/functional/089.out |  2 +-
 tests/functional/090.out |  6 +++---
 tests/functional/096.out |  2 ++
 tests/functional/099.out |  4 ++--
 18 files changed, 52 insertions(+), 48 deletions(-)

-- 
2.1.0



-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] [PATCH 1/5] tests: avoid qemu-io warning

2015-02-11 Thread Wang dongxu
qemu-io command add a warning message because probing a raw img is dangerous. So
add -f option to avoid this.

Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com
---
 tests/functional/013 |  6 +++---
 tests/functional/017 |  2 +-
 tests/functional/024 |  6 +++---
 tests/functional/025 |  4 ++--
 tests/functional/039 | 22 +++---
 tests/functional/058 |  2 +-
 tests/functional/059 |  2 +-
 tests/functional/075 |  2 +-
 8 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tests/functional/013 b/tests/functional/013
index b35b806..f724841 100755
--- a/tests/functional/013
+++ b/tests/functional/013
@@ -14,11 +14,11 @@ _cluster_format -c 1
 
 _vdi_create test 4G
 for i in `seq 1 9`; do
-$QEMU_IO -c write 0 512 -P $i sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 -P $i sheepdog:test | _filter_qemu_io
 $QEMU_IMG snapshot -c tag$i sheepdog:test
 done
 
-$QEMU_IO -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io
 for i in `seq 1 9`; do
-$QEMU_IO -c read 0 512 -P $i sheepdog:test:tag$i | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P $i sheepdog:test:tag$i | _filter_qemu_io
 done
diff --git a/tests/functional/017 b/tests/functional/017
index 5ebe7da..1c22c76 100755
--- a/tests/functional/017
+++ b/tests/functional/017
@@ -20,7 +20,7 @@ $QEMU_IMG snapshot -c tag3 sheepdog:test
 _vdi_create test2 4G
 $QEMU_IMG snapshot -c tag1 sheepdog:test2
 $QEMU_IMG snapshot -c tag2 sheepdog:test2
-$QEMU_IO -c write 0 512 sheepdog:test2:1 | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 sheepdog:test2:1 | _filter_qemu_io
 $QEMU_IMG snapshot -c tag3 sheepdog:test2
 
 $DOG vdi tree | _filter_short_date
diff --git a/tests/functional/024 b/tests/functional/024
index e1c1180..e8a33c4 100755
--- a/tests/functional/024
+++ b/tests/functional/024
@@ -23,14 +23,14 @@ _vdi_create ${VDI_NAME} ${VDI_SIZE}
 sleep 1
 
 echo filling ${VDI_NAME} with data
-$QEMU_IO -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io
 
 echo reading back ${VDI_NAME}
-$QEMU_IO -c read 0 1m sheepdog:${VDI_NAME} | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 1m sheepdog:${VDI_NAME} | _filter_qemu_io
 
 echo starting second sheep
 _start_sheep 6
 _wait_for_sheep 7
 
 echo reading data from second sheep
-$QEMU_IO -c read 0 ${VDI_SIZE} sheepdog:localhost:7001:${VDI_NAME} | 
_filter_qemu_io
+$QEMU_IO -f raw -c read 0 ${VDI_SIZE} sheepdog:localhost:7001:${VDI_NAME} | 
_filter_qemu_io
diff --git a/tests/functional/025 b/tests/functional/025
index 8f89ccb..37af0ea 100755
--- a/tests/functional/025
+++ b/tests/functional/025
@@ -26,10 +26,10 @@ echo creating vdi ${NAME}
 $DOG vdi create ${VDI_NAME} ${VDI_SIZE}
 
 echo filling ${VDI_NAME} with data
-$QEMU_IO -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 ${VDI_SIZE} sheepdog:${VDI_NAME} | _filter_qemu_io
 
 echo reading back ${VDI_NAME} from second zone
-$QEMU_IO -c read 0 1m sheepdog:localhost:7002:${VDI_NAME} | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 1m sheepdog:localhost:7002:${VDI_NAME} | 
_filter_qemu_io
 
 echo starting a sheep in the third zone
 for i in `seq 3 3`; do
diff --git a/tests/functional/039 b/tests/functional/039
index 5b2540f..fddd4fb 100755
--- a/tests/functional/039
+++ b/tests/functional/039
@@ -13,37 +13,37 @@ _wait_for_sheep 6
 _cluster_format -c 6
 _vdi_create test 4G
 
-$QEMU_IO -c write 0 512 -P 1 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 -P 1 sheepdog:test | _filter_qemu_io
 $DOG vdi snapshot test -s snap1
-$QEMU_IO -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io
 
 echo yes | $DOG vdi rollback test -s snap1
-$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
 $DOG vdi tree | _filter_short_date
 _vdi_list
 
-$QEMU_IO -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 -P 2 sheepdog:test | _filter_qemu_io
 $DOG vdi snapshot test -s snap2
-$QEMU_IO -c write 0 512 -P 3 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c write 0 512 -P 3 sheepdog:test | _filter_qemu_io
 
 echo yes | $DOG vdi rollback test -s snap1
-$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
 $DOG vdi tree | _filter_short_date
 _vdi_list
 
 echo yes | $DOG vdi rollback test -s snap2
-$QEMU_IO -c read 0 512 -P 2 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P 2 sheepdog:test | _filter_qemu_io
 $DOG vdi tree | _filter_short_date
 _vdi_list
 
 echo yes | $DOG vdi rollback test -s snap1
-$QEMU_IO -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
+$QEMU_IO -f raw -c read 0 512 -P 1 sheepdog:test | _filter_qemu_io
 $DOG vdi tree | _filter_short_date
 _vdi_list
 

Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Hitoshi Mitake
At Thu, 12 Feb 2015 15:00:49 +0800,
Liu Yuan wrote:
 
 On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote:
  At Tue, 10 Feb 2015 18:35:58 +0800,
  Liu Yuan wrote:
   
   On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote:
(2015/02/10 17:58), Liu Yuan wrote:
On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote:
(2015/02/10 12:10), Liu Yuan wrote:
On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
Previously, qemu block driver of sheepdog used hard-coded VDI 
object size.
This patch enables users to handle block_size_shift value for
calculating VDI object size.

When you start qemu, you don't need to specify additional command 
option.

But when you create the VDI which doesn't have default object size
with qemu-img command, you specify block_size_shift option.

If you want to create a VDI of 8MB(1  23) object size,
you need to specify following command option.

  # qemu-img create -o block_size_shift=23 sheepdog:test1 100M

In addition, when you don't specify qemu-img command option,
a default value of sheepdog cluster is used for creating VDI.

  # qemu-img create sheepdog:test2 100M

Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
---
V4:
  - Limit a read/write buffer size for creating a preallocated VDI.
  - Replace a parse function for the block_size_shift option.
  - Fix an error message.

V3:
  - Delete the needless operation of buffer.
  - Delete the needless operations of request header.
for SD_OP_GET_CLUSTER_DEFAULT.
  - Fix coding style problems.

V2:
  - Fix coding style problem (white space).
  - Add members, store_policy and block_size_shift to struct 
 SheepdogVdiReq.
  - Initialize request header to use block_size_shift specified by 
 user.
---
  block/sheepdog.c  |  138 
 ++---
  include/block/block_int.h |1 +
  2 files changed, 119 insertions(+), 20 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index be3176f..a43b947 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -37,6 +37,7 @@
  #define SD_OP_READ_VDIS  0x15
  #define SD_OP_FLUSH_VDI  0x16
  #define SD_OP_DEL_VDI0x17
+#define SD_OP_GET_CLUSTER_DEFAULT   0x18

  #define SD_FLAG_CMD_WRITE0x01
  #define SD_FLAG_CMD_COW  0x02
@@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
  uint32_t base_vdi_id;
  uint8_t copies;
  uint8_t copy_policy;
-uint8_t reserved[2];
+uint8_t store_policy;
+uint8_t block_size_shift;
  uint32_t snapid;
  uint32_t type;
  uint32_t pad[2];
@@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
  uint32_t pad[5];
  } SheepdogVdiRsp;

+typedef struct SheepdogClusterRsp {
+uint8_t proto_ver;
+uint8_t opcode;
+uint16_t flags;
+uint32_t epoch;
+uint32_t id;
+uint32_t data_length;
+uint32_t result;
+uint8_t nr_copies;
+uint8_t copy_policy;
+uint8_t block_size_shift;
+uint8_t __pad1;
+uint32_t __pad2[6];
+} SheepdogClusterRsp;
+
  typedef struct SheepdogInode {
  char name[SD_MAX_VDI_LEN];
  char tag[SD_MAX_VDI_TAG_LEN];
@@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, 
uint32_t *vdi_id, int snapshot,
  hdr.vdi_size = s-inode.vdi_size;
  hdr.copy_policy = s-inode.copy_policy;
  hdr.copies = s-inode.nr_copies;
+hdr.block_size_shift = s-inode.block_size_shift;

  ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, 
 wlen, rlen);

@@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState 
*s, uint32_t *vdi_id, int snapshot,
  static int sd_prealloc(const char *filename, Error **errp)
  {
  BlockDriverState *bs = NULL;
+BDRVSheepdogState *base = NULL;
+unsigned long buf_size;
  uint32_t idx, max_idx;
+uint32_t object_size;
  int64_t vdi_size;
-void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
+void *buf = NULL;
  int ret;

  ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
 BDRV_O_PROTOCOL,
@@ -1585,18 +1606,24 @@ static int sd_prealloc(const char 
*filename, Error **errp)
  ret = vdi_size;
  goto out;
  }
-max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
+
+base = bs-opaque;
+object_size = (UINT32_C(1)  base-inode.block_size_shift);
+buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
+buf = g_malloc0(buf_size);
+
+max_idx = DIV_ROUND_UP(vdi_size, buf_size);

  for (idx = 0; idx  max_idx; idx++) {

Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots

2015-02-11 Thread Liu Yuan
On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote:
 Current dog vdi snapshot command creates a new snapshot
 unconditionally, even if a working VDI doesn't have its own
 objects. In such a case, the created snapshot is redundant because
 same VDI is already existing.

What kind of use case will create two identical snapshots? This logic is simple
and code is clean, but I doubt if there is real users of this option.

 
 This patch adds a new option -R to the dog command for reducing
 the identical snapshots.
 
 Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
 ---
  dog/vdi.c | 48 +++-
  1 file changed, 47 insertions(+), 1 deletion(-)
 
 diff --git a/dog/vdi.c b/dog/vdi.c
 index 8e612af..ee465c2 100644
 --- a/dog/vdi.c
 +++ b/dog/vdi.c
 @@ -40,6 +40,8 @@ static struct sd_option vdi_options[] = {
  neither comparing nor repairing},
   {'z', block_size_shift, true, specify the bit shift num for
   data object size},
 + {'R', reduce-identical-snapshots, false, do not create snapshot if 
 +  working VDI doesn't have its own objects},
   { 0, NULL, false, NULL },
  };
  
 @@ -61,6 +63,7 @@ static struct vdi_cmd_data {
   uint64_t oid;
   bool no_share;
   bool exist;
 + bool reduce_identical_snapshots;
  } vdi_cmd_data = { ~0, };
  
  struct get_vdi_info {
 @@ -605,6 +608,31 @@ fail:
   return NULL;
  }
  
 +static bool has_own_objects(uint32_t vid, int *ret)

Traditionally, we'll have functions return SD_RES_xxx because in this way we
could propragate the ret to upper callers.

So it is better to have has_own_objects return SD_RES_xxx for consistency.

Thanks,
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 04:28:01PM +0900, Hitoshi Mitake wrote:
 At Thu, 12 Feb 2015 15:00:49 +0800,
 Liu Yuan wrote:
  
  On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote:
   At Tue, 10 Feb 2015 18:35:58 +0800,
   Liu Yuan wrote:

On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote:
 (2015/02/10 17:58), Liu Yuan wrote:
 On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote:
 (2015/02/10 12:10), Liu Yuan wrote:
 On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
 Previously, qemu block driver of sheepdog used hard-coded VDI 
 object size.
 This patch enables users to handle block_size_shift value for
 calculating VDI object size.
 
 When you start qemu, you don't need to specify additional command 
 option.
 
 But when you create the VDI which doesn't have default object size
 with qemu-img command, you specify block_size_shift option.
 
 If you want to create a VDI of 8MB(1  23) object size,
 you need to specify following command option.
 
   # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
 
 In addition, when you don't specify qemu-img command option,
 a default value of sheepdog cluster is used for creating VDI.
 
   # qemu-img create sheepdog:test2 100M
 
 Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
 ---
 V4:
   - Limit a read/write buffer size for creating a preallocated 
  VDI.
   - Replace a parse function for the block_size_shift option.
   - Fix an error message.
 
 V3:
   - Delete the needless operation of buffer.
   - Delete the needless operations of request header.
 for SD_OP_GET_CLUSTER_DEFAULT.
   - Fix coding style problems.
 
 V2:
   - Fix coding style problem (white space).
   - Add members, store_policy and block_size_shift to struct 
  SheepdogVdiReq.
   - Initialize request header to use block_size_shift specified 
  by user.
 ---
   block/sheepdog.c  |  138 
  ++---
   include/block/block_int.h |1 +
   2 files changed, 119 insertions(+), 20 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index be3176f..a43b947 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -37,6 +37,7 @@
   #define SD_OP_READ_VDIS  0x15
   #define SD_OP_FLUSH_VDI  0x16
   #define SD_OP_DEL_VDI0x17
 +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
 
   #define SD_FLAG_CMD_WRITE0x01
   #define SD_FLAG_CMD_COW  0x02
 @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
   uint32_t base_vdi_id;
   uint8_t copies;
   uint8_t copy_policy;
 -uint8_t reserved[2];
 +uint8_t store_policy;
 +uint8_t block_size_shift;
   uint32_t snapid;
   uint32_t type;
   uint32_t pad[2];
 @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
   uint32_t pad[5];
   } SheepdogVdiRsp;
 
 +typedef struct SheepdogClusterRsp {
 +uint8_t proto_ver;
 +uint8_t opcode;
 +uint16_t flags;
 +uint32_t epoch;
 +uint32_t id;
 +uint32_t data_length;
 +uint32_t result;
 +uint8_t nr_copies;
 +uint8_t copy_policy;
 +uint8_t block_size_shift;
 +uint8_t __pad1;
 +uint32_t __pad2[6];
 +} SheepdogClusterRsp;
 +
   typedef struct SheepdogInode {
   char name[SD_MAX_VDI_LEN];
   char tag[SD_MAX_VDI_TAG_LEN];
 @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState 
 *s, uint32_t *vdi_id, int snapshot,
   hdr.vdi_size = s-inode.vdi_size;
   hdr.copy_policy = s-inode.copy_policy;
   hdr.copies = s-inode.nr_copies;
 +hdr.block_size_shift = s-inode.block_size_shift;
 
   ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, 
  wlen, rlen);
 
 @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState 
 *s, uint32_t *vdi_id, int snapshot,
   static int sd_prealloc(const char *filename, Error **errp)
   {
   BlockDriverState *bs = NULL;
 +BDRVSheepdogState *base = NULL;
 +unsigned long buf_size;
   uint32_t idx, max_idx;
 +uint32_t object_size;
   int64_t vdi_size;
 -void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
 +void *buf = NULL;
   int ret;
 
   ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
  BDRV_O_PROTOCOL,
 @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char 
 *filename, Error **errp)
   ret = vdi_size;
   goto out;
   }
 -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
 +
 +base = bs-opaque;
 +object_size = (UINT32_C(1)  

Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots

2015-02-11 Thread Hitoshi Mitake
At Thu, 12 Feb 2015 15:31:15 +0800,
Liu Yuan wrote:
 
 On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote:
  At Thu, 12 Feb 2015 14:38:37 +0800,
  Liu Yuan wrote:
   
   On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote:
Current dog vdi snapshot command creates a new snapshot
unconditionally, even if a working VDI doesn't have its own
objects. In such a case, the created snapshot is redundant because
same VDI is already existing.
   
   What kind of use case will create two identical snapshots? This logic is 
   simple
   and code is clean, but I doubt if there is real users of this option.
  
  Generally speaking, taking snapshot periodically is an ordinal usecase
  of enterprise SAN. Of course sheepdog can support this use case. In a
  case of sheepdog, making cron job (e.g. daily) which invokes dog vdi
  snapshot simply enables it.
  
  But if a VDI doesn't have COWed objects, the snapshot will be
  redundant. So I want to add this option.
 
 Okay, your patch makes sense for periodic snapshot. But if dog have found
 identical snapshots, it won't create a new one and return success to the 
 caller.
 
 I assume the caller is some middleware, if there is no new vdi returned, will
 this cause trouble for it? This means it will need to call 'vdi list' to check
 if new vdi created or not? 

So I'm adding this feature with the new option. Existing semantics
isn't affected. And if checking process (has_own_objects()) faces
error, it is reported correctly to middleware.

 
 I'm not agasint this patch, but I have some questions. For identical 
 snapshots,
 the overhead is just an inode object created, no? Looks to me the overhead is
 quite small and no need a special option to remove it.

Taking snapshots of thousands of VDIs will consume thousands of VID,
and create thousands * replication factor of inodes. I'm not sure the
consumption of VID will become serious problem, but inodes will make
replication time longer (e.g. 16:4 ec requires 20 inodes).

Thanks,
Hitoshi

 
 Thanks
 Yuan
 -- 
 sheepdog mailing list
 sheepdog@lists.wpkg.org
 https://lists.wpkg.org/mailman/listinfo/sheepdog
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] [PATCH] sheepdog:show more detail of the crash source

2015-02-11 Thread Wang Zhengyong
In the current sheepdog, when sheepdog crashed,
there is too little information about the signal source.

This patch use (*handler)(int, siginfo_t *, void *)
instead of (*handler)(int). In this way, can show more detail of the
crash problem, especially the pid of singal sender

Cc: Hitoshi Mitake mitake.hito...@gmail.com
Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com
---
 dog/dog.c   |2 +-
 include/util.h  |4 ++--
 lib/logger.c|4 ++--
 lib/util.c  |   11 +++
 sheep/sheep.c   |9 +
 shepherd/shepherd.c |2 +-
 6 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/dog/dog.c b/dog/dog.c
index 54520dd..77aa27b 100644
--- a/dog/dog.c
+++ b/dog/dog.c
@@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char 
*opts)
return sd_opts;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
sd_err(dog exits unexpectedly (%s)., strsignal(signo));
 
diff --git a/include/util.h b/include/util.h
index 6a513e0..3c34b40 100644
--- a/include/util.h
+++ b/include/util.h
@@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path);
 int purge_directory(const char *dir_path);
 bool is_numeric(const char *p);
 const char *data_to_str(void *data, size_t data_length);
-int install_sighandler(int signum, void (*handler)(int), bool once);
-int install_crash_handler(void (*handler)(int));
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once);
+int install_crash_handler(void (*handler)(int, siginfo_t *, void *));
 void reraise_crash_signal(int signo, int status);
 pid_t gettid(void);
 int tkill(int tid, int sig);
diff --git a/lib/logger.c b/lib/logger.c
index 02bab00..da0ebac 100644
--- a/lib/logger.c
+++ b/lib/logger.c
@@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo)
return signo == SIGHUP;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
if (is_sheep_dead(signo))
sd_err(sheep pid %d exited unexpectedly., sheep_pid);
@@ -552,7 +552,7 @@ static void crash_handler(int signo)
reraise_crash_signal(signo, 1);
 }
 
-static void sighup_handler(int signo)
+static void sighup_handler(int signo, siginfo_t *info, void *context)
 {
rotate_log();
 }
diff --git a/lib/util.c b/lib/util.c
index 21e0143..089455d 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length)
  * If 'once' is true, the signal will be restored to the default state
  * after 'handler' is called.
  */
-int install_sighandler(int signum, void (*handler)(int), bool once)
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once)
 {
struct sigaction sa = {};
 
sa.sa_handler = handler;
+   sa.sa_flags = SA_SIGINFO;
+
if (once)
-   sa.sa_flags = SA_RESETHAND | SA_NODEFER;
+   sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER;
sigemptyset(sa.sa_mask);
 
return sigaction(signum, sa, NULL);
 }
 
-int install_crash_handler(void (*handler)(int))
+int install_crash_handler(void (*handler)(int, siginfo_t *, void *))
 {
return install_sighandler(SIGSEGV, handler, true) ||
install_sighandler(SIGABRT, handler, true) ||
install_sighandler(SIGBUS, handler, true) ||
install_sighandler(SIGILL, handler, true) ||
-   install_sighandler(SIGFPE, handler, true);
+   install_sighandler(SIGFPE, handler, true) ||
+   install_sighandler(SIGQUIT, handler, true);
 }
 
 /*
diff --git a/sheep/sheep.c b/sheep/sheep.c
index e0a034f..6c540ae 100644
--- a/sheep/sheep.c
+++ b/sheep/sheep.c
@@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void 
*data)
 
ret = read(sigfd, siginfo, sizeof(siginfo));
assert(ret == sizeof(siginfo));
-   sd_debug(signal %d, siginfo.ssi_signo);
+   sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid);
switch (siginfo.ssi_signo) {
case SIGTERM:
sys-cinfo.status = SD_STATUS_KILLED;
@@ -276,9 +276,10 @@ static int init_signal(void)
return 0;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
-   sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo));
+   sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, 
code %d,
+   strsignal(signo), info-si_pid, info-si_uid, info-si_errno, 
info-si_code);
 
sd_backtrace();
sd_dump_variable(__sys);
@@ -639,7 +640,7 @@ end:
return status;
 }
 
-static void sighup_handler(int signum)
+static void sighup_handler(int signo, siginfo_t *info, void *context)
 {
if (unlikely(logger_pid == -1))
return;
diff --git 

[sheepdog] [PATCH 2/5] tests: avoid qemu-img snapshot warning

2015-02-11 Thread Wang dongxu
qemu-img snapshot option will print warining message while probing a raw 
image, so
filter them using sed.

Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com
---
 tests/functional/013 |  2 +-
 tests/functional/017 | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/functional/013 b/tests/functional/013
index f724841..d19d8f8 100755
--- a/tests/functional/013
+++ b/tests/functional/013
@@ -15,7 +15,7 @@ _cluster_format -c 1
 _vdi_create test 4G
 for i in `seq 1 9`; do
 $QEMU_IO -f raw -c write 0 512 -P $i sheepdog:test | _filter_qemu_io
-$QEMU_IMG snapshot -c tag$i sheepdog:test
+$QEMU_IMG snapshot -c tag$i sheepdog:test 21 | sed '/WARNING/, +2 d'
 done
 
 $QEMU_IO -f raw -c read 0 512 -P 9 sheepdog:test | _filter_qemu_io
diff --git a/tests/functional/017 b/tests/functional/017
index 1c22c76..2c34a55 100755
--- a/tests/functional/017
+++ b/tests/functional/017
@@ -13,14 +13,14 @@ _wait_for_sheep 6
 _cluster_format -c 1
 
 _vdi_create test 4G
-$QEMU_IMG snapshot -c tag1 sheepdog:test
-$QEMU_IMG snapshot -c tag2 sheepdog:test
-$QEMU_IMG snapshot -c tag3 sheepdog:test
+$QEMU_IMG snapshot -c tag1 sheepdog:test 21 | sed '/WARNING/, +2 d'
+$QEMU_IMG snapshot -c tag2 sheepdog:test 21 | sed '/WARNING/, +2 d'
+$QEMU_IMG snapshot -c tag3 sheepdog:test 21 | sed '/WARNING/, +2 d'
 
 _vdi_create test2 4G
-$QEMU_IMG snapshot -c tag1 sheepdog:test2
-$QEMU_IMG snapshot -c tag2 sheepdog:test2
+$QEMU_IMG snapshot -c tag1 sheepdog:test2 21 | sed '/WARNING/, +2 d'
+$QEMU_IMG snapshot -c tag2 sheepdog:test2 21 | sed '/WARNING/, +2 d'
 $QEMU_IO -f raw -c write 0 512 sheepdog:test2:1 | _filter_qemu_io
-$QEMU_IMG snapshot -c tag3 sheepdog:test2
+$QEMU_IMG snapshot -c tag3 sheepdog:test2 21 | sed '/WARNING/, +2 d'
 
 $DOG vdi tree | _filter_short_date
-- 
2.1.0



-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] [PATCH 3/5] tests: correct vdi list

2015-02-11 Thread Wang dongxu
dog vdi list add column Block Size Shift, add them to test cases.

Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com
---
 tests/functional/073.out |  2 +-
 tests/functional/081.out |  6 +++---
 tests/functional/082.out |  6 +++---
 tests/functional/087.out | 10 +-
 tests/functional/089.out |  2 +-
 tests/functional/090.out |  6 +++---
 tests/functional/099.out |  4 ++--
 7 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tests/functional/073.out b/tests/functional/073.out
index 8dd2173..3c5fd47 100644
--- a/tests/functional/073.out
+++ b/tests/functional/073.out
@@ -6,6 +6,6 @@ Cluster created at DATE
 
 Epoch Time   Version [Host:Port:V-Nodes,,,]
 DATE  1 [127.0.0.1:7000:128, 127.0.0.1:7001:128, 127.0.0.1:7002:128]
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   test 0  4.0 MB  0.0 MB  0.0 MB DATE   7c2b25  3  
 hello
diff --git a/tests/functional/081.out b/tests/functional/081.out
index 92df8ca..7092e97 100644
--- a/tests/functional/081.out
+++ b/tests/functional/081.out
@@ -55,7 +55,7 @@ vdi.c
  HTTP/1.1 416 Requested Range Not Satisfiable
  HTTP/1.1 416 Requested Range Not Satisfiable
  HTTP/1.1 416 Requested Range Not Satisfiable
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  144 MB  0.0 MB DATE   8ad11e4:2  
@@ -65,7 +65,7 @@ data137
 data19
 data4
 data97
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  144 MB  0.0 MB DATE   8ad11e4:2  
@@ -73,7 +73,7 @@ data97
   sd/sheep/allocator 0   16 PB  268 MB  0.0 MB DATE   fd57fc4:2
  
 dog
 sheep
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  144 MB  0.0 MB DATE   8ad11e4:2  
diff --git a/tests/functional/082.out b/tests/functional/082.out
index b3f4dd9..78c5e6a 100644
--- a/tests/functional/082.out
+++ b/tests/functional/082.out
@@ -60,7 +60,7 @@ trace.c
 treeview.c
 trunk.c
 vdi.c
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  176 MB  0.0 MB DATE   8ad11e4:2  
@@ -78,7 +78,7 @@ data6
 data7
 data8
 data9
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  176 MB  0.0 MB DATE   8ad11e4:2  
@@ -86,7 +86,7 @@ data9
   sd/sheep/allocator 0   16 PB  316 MB  0.0 MB DATE   fd57fc4:2
  
 dog
 sheep
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd/dog   0   16 PB   56 MB  0.0 MB DATE   5a5cbf4:2  
   sd   0   16 PB  8.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  176 MB  0.0 MB DATE   8ad11e4:2  
diff --git a/tests/functional/087.out b/tests/functional/087.out
index 04e4210..0fcc7f3 100644
--- a/tests/functional/087.out
+++ b/tests/functional/087.out
@@ -2,11 +2,11 @@ QA output created by 087
 using backend plain store
 206
 206
-  NameIdSizeUsed  SharedCreation time   VDI id  Copies  Tag
+  NameIdSizeUsed  SharedCreation time   VDI id  Copies  
Tag   Block Size Shift
   sd   0   16 PB  4.0 MB  0.0 MB DATE   7927f24:2  
   sd/sheep 0   16 PB  8.0 MB  0.0 MB DATE   8ad11e4:2  
   sd/sheep/allocator 0   16 PB  

Re: [sheepdog] [PATCH v2] sheepdog:show more detail of the crash source

2015-02-11 Thread Hitoshi Mitake
At Wed, 11 Feb 2015 23:44:53 -0800,
Wang Zhengyong wrote:
 
 In the current sheepdog, when sheepdog crashed,
 there is too little information about the signal source.
 
 This patch use (*handler)(int, siginfo_t *, void *)
 instead of (*handler)(int). In this way, can show more detail of the
 crash problem, especially the pid of singal sender
 
 Cc: Hitoshi Mitake mitake.hito...@gmail.com
 Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com
 ---
 v2: fix the wrong handler assignment
 ---

Sorry, I missed style problems in the previous version, checkpatch
reports like below:

WARNING: line over 80 characters
#72: FILE: include/util.h:111:
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once);

WARNING: line over 80 characters
#108: FILE: lib/util.c:527:
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once)

WARNING: line over 80 characters
#159: FILE: sheep/sheep.c:282:
+   strsignal(signo), info-si_pid, info-si_uid, info-si_errno, 
info-si_code);

Could you fix them? Then I can apply it.

Thanks,
Hitoshi

  dog/dog.c   |2 +-
  include/util.h  |4 ++--
  lib/logger.c|4 ++--
  lib/util.c  |   13 -
  sheep/sheep.c   |9 +
  shepherd/shepherd.c |2 +-
  6 files changed, 19 insertions(+), 15 deletions(-)
 
 diff --git a/dog/dog.c b/dog/dog.c
 index 54520dd..77aa27b 100644
 --- a/dog/dog.c
 +++ b/dog/dog.c
 @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const 
 char *opts)
   return sd_opts;
  }
  
 -static void crash_handler(int signo)
 +static void crash_handler(int signo, siginfo_t *info, void *context)
  {
   sd_err(dog exits unexpectedly (%s)., strsignal(signo));
  
 diff --git a/include/util.h b/include/util.h
 index 6a513e0..3c34b40 100644
 --- a/include/util.h
 +++ b/include/util.h
 @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path);
  int purge_directory(const char *dir_path);
  bool is_numeric(const char *p);
  const char *data_to_str(void *data, size_t data_length);
 -int install_sighandler(int signum, void (*handler)(int), bool once);
 -int install_crash_handler(void (*handler)(int));
 +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void 
 *), bool once);
 +int install_crash_handler(void (*handler)(int, siginfo_t *, void *));
  void reraise_crash_signal(int signo, int status);
  pid_t gettid(void);
  int tkill(int tid, int sig);
 diff --git a/lib/logger.c b/lib/logger.c
 index 02bab00..da0ebac 100644
 --- a/lib/logger.c
 +++ b/lib/logger.c
 @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo)
   return signo == SIGHUP;
  }
  
 -static void crash_handler(int signo)
 +static void crash_handler(int signo, siginfo_t *info, void *context)
  {
   if (is_sheep_dead(signo))
   sd_err(sheep pid %d exited unexpectedly., sheep_pid);
 @@ -552,7 +552,7 @@ static void crash_handler(int signo)
   reraise_crash_signal(signo, 1);
  }
  
 -static void sighup_handler(int signo)
 +static void sighup_handler(int signo, siginfo_t *info, void *context)
  {
   rotate_log();
  }
 diff --git a/lib/util.c b/lib/util.c
 index 21e0143..e217629 100644
 --- a/lib/util.c
 +++ b/lib/util.c
 @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length)
   * If 'once' is true, the signal will be restored to the default state
   * after 'handler' is called.
   */
 -int install_sighandler(int signum, void (*handler)(int), bool once)
 +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void 
 *), bool once)
  {
   struct sigaction sa = {};
  
 - sa.sa_handler = handler;
 + sa.sa_sigaction = handler;
 + sa.sa_flags = SA_SIGINFO;
 +
   if (once)
 - sa.sa_flags = SA_RESETHAND | SA_NODEFER;
 + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER;
   sigemptyset(sa.sa_mask);
  
   return sigaction(signum, sa, NULL);
  }
  
 -int install_crash_handler(void (*handler)(int))
 +int install_crash_handler(void (*handler)(int, siginfo_t *, void *))
  {
   return install_sighandler(SIGSEGV, handler, true) ||
   install_sighandler(SIGABRT, handler, true) ||
   install_sighandler(SIGBUS, handler, true) ||
   install_sighandler(SIGILL, handler, true) ||
 - install_sighandler(SIGFPE, handler, true);
 + install_sighandler(SIGFPE, handler, true) ||
 + install_sighandler(SIGQUIT, handler, true);
  }
  
  /*
 diff --git a/sheep/sheep.c b/sheep/sheep.c
 index e0a034f..6c540ae 100644
 --- a/sheep/sheep.c
 +++ b/sheep/sheep.c
 @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, 
 void *data)
  
   ret = read(sigfd, siginfo, sizeof(siginfo));
   assert(ret == sizeof(siginfo));
 - sd_debug(signal %d, siginfo.ssi_signo);
 + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid);
   switch 

[sheepdog] [PATCH v2] sheepdog:show more detail of the crash source

2015-02-11 Thread Wang Zhengyong
In the current sheepdog, when sheepdog crashed,
there is too little information about the signal source.

This patch use (*handler)(int, siginfo_t *, void *)
instead of (*handler)(int). In this way, can show more detail of the
crash problem, especially the pid of singal sender

Cc: Hitoshi Mitake mitake.hito...@gmail.com
Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com
---
v2: fix the wrong handler assignment
---
 dog/dog.c   |2 +-
 include/util.h  |4 ++--
 lib/logger.c|4 ++--
 lib/util.c  |   13 -
 sheep/sheep.c   |9 +
 shepherd/shepherd.c |2 +-
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/dog/dog.c b/dog/dog.c
index 54520dd..77aa27b 100644
--- a/dog/dog.c
+++ b/dog/dog.c
@@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const char 
*opts)
return sd_opts;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
sd_err(dog exits unexpectedly (%s)., strsignal(signo));
 
diff --git a/include/util.h b/include/util.h
index 6a513e0..3c34b40 100644
--- a/include/util.h
+++ b/include/util.h
@@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path);
 int purge_directory(const char *dir_path);
 bool is_numeric(const char *p);
 const char *data_to_str(void *data, size_t data_length);
-int install_sighandler(int signum, void (*handler)(int), bool once);
-int install_crash_handler(void (*handler)(int));
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once);
+int install_crash_handler(void (*handler)(int, siginfo_t *, void *));
 void reraise_crash_signal(int signo, int status);
 pid_t gettid(void);
 int tkill(int tid, int sig);
diff --git a/lib/logger.c b/lib/logger.c
index 02bab00..da0ebac 100644
--- a/lib/logger.c
+++ b/lib/logger.c
@@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo)
return signo == SIGHUP;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
if (is_sheep_dead(signo))
sd_err(sheep pid %d exited unexpectedly., sheep_pid);
@@ -552,7 +552,7 @@ static void crash_handler(int signo)
reraise_crash_signal(signo, 1);
 }
 
-static void sighup_handler(int signo)
+static void sighup_handler(int signo, siginfo_t *info, void *context)
 {
rotate_log();
 }
diff --git a/lib/util.c b/lib/util.c
index 21e0143..e217629 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length)
  * If 'once' is true, the signal will be restored to the default state
  * after 'handler' is called.
  */
-int install_sighandler(int signum, void (*handler)(int), bool once)
+int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void *), 
bool once)
 {
struct sigaction sa = {};
 
-   sa.sa_handler = handler;
+   sa.sa_sigaction = handler;
+   sa.sa_flags = SA_SIGINFO;
+
if (once)
-   sa.sa_flags = SA_RESETHAND | SA_NODEFER;
+   sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER;
sigemptyset(sa.sa_mask);
 
return sigaction(signum, sa, NULL);
 }
 
-int install_crash_handler(void (*handler)(int))
+int install_crash_handler(void (*handler)(int, siginfo_t *, void *))
 {
return install_sighandler(SIGSEGV, handler, true) ||
install_sighandler(SIGABRT, handler, true) ||
install_sighandler(SIGBUS, handler, true) ||
install_sighandler(SIGILL, handler, true) ||
-   install_sighandler(SIGFPE, handler, true);
+   install_sighandler(SIGFPE, handler, true) ||
+   install_sighandler(SIGQUIT, handler, true);
 }
 
 /*
diff --git a/sheep/sheep.c b/sheep/sheep.c
index e0a034f..6c540ae 100644
--- a/sheep/sheep.c
+++ b/sheep/sheep.c
@@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, void 
*data)
 
ret = read(sigfd, siginfo, sizeof(siginfo));
assert(ret == sizeof(siginfo));
-   sd_debug(signal %d, siginfo.ssi_signo);
+   sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid);
switch (siginfo.ssi_signo) {
case SIGTERM:
sys-cinfo.status = SD_STATUS_KILLED;
@@ -276,9 +276,10 @@ static int init_signal(void)
return 0;
 }
 
-static void crash_handler(int signo)
+static void crash_handler(int signo, siginfo_t *info, void *context)
 {
-   sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo));
+   sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, 
code %d,
+   strsignal(signo), info-si_pid, info-si_uid, info-si_errno, 
info-si_code);
 
sd_backtrace();
sd_dump_variable(__sys);
@@ -639,7 +640,7 @@ end:
return status;
 }
 
-static void sighup_handler(int signum)
+static void sighup_handler(int signo, siginfo_t *info, void *context)
 {
 

[sheepdog] [PATCH 5/5] tests:fix vnode strategy output

2015-02-11 Thread Wang dongxu
since commit 5fed9d6, Cluster vnodes strategy information is produced,
so fix them in test cases.

Signed-off-by: Wang dongxu wangdon...@cmss.chinamobile.com
---
 tests/functional/030.out | 1 +
 tests/functional/096.out | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tests/functional/030.out b/tests/functional/030.out
index 2c788f0..baf51a7 100644
--- a/tests/functional/030.out
+++ b/tests/functional/030.out
@@ -37,6 +37,7 @@ s test22   10 MB   12 MB  0.0 MB DATE   fd3816  3 
   22
   test20   10 MB  0.0 MB   12 MB DATE   fd3817  322
 Cluster status: running, auto-recovery enabled
 Cluster store: plain with 6 redundancy policy
+Cluster vnodes strategy: auto
 Cluster vnode mode: node
 Cluster created at DATE
 
diff --git a/tests/functional/096.out b/tests/functional/096.out
index 2ff9dc6..a555287 100644
--- a/tests/functional/096.out
+++ b/tests/functional/096.out
@@ -27,6 +27,7 @@ $ ../../dog/dog cluster format -c 3
 $ ../../dog/dog cluster info -v
 Cluster status: running, auto-recovery enabled
 Cluster store: plain with 3 redundancy policy
+Cluster vnodes strategy: auto
 Cluster vnode mode: node
 Cluster created at DATE
 
@@ -80,6 +81,7 @@ The cluster's redundancy level is set to 2, the old one was 3.
 $ ../../dog/dog cluster info -v
 Cluster status: running, auto-recovery enabled
 Cluster store: plain with 2 redundancy policy
+Cluster vnodes strategy: auto
 Cluster vnode mode: node
 Cluster created at DATE
 
-- 
2.1.0



-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 03:19:21PM +0900, Hitoshi Mitake wrote:
 At Tue, 10 Feb 2015 18:35:58 +0800,
 Liu Yuan wrote:
  
  On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote:
   (2015/02/10 17:58), Liu Yuan wrote:
   On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote:
   (2015/02/10 12:10), Liu Yuan wrote:
   On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
   Previously, qemu block driver of sheepdog used hard-coded VDI object 
   size.
   This patch enables users to handle block_size_shift value for
   calculating VDI object size.
   
   When you start qemu, you don't need to specify additional command 
   option.
   
   But when you create the VDI which doesn't have default object size
   with qemu-img command, you specify block_size_shift option.
   
   If you want to create a VDI of 8MB(1  23) object size,
   you need to specify following command option.
   
 # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
   
   In addition, when you don't specify qemu-img command option,
   a default value of sheepdog cluster is used for creating VDI.
   
 # qemu-img create sheepdog:test2 100M
   
   Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
   ---
   V4:
 - Limit a read/write buffer size for creating a preallocated VDI.
 - Replace a parse function for the block_size_shift option.
 - Fix an error message.
   
   V3:
 - Delete the needless operation of buffer.
 - Delete the needless operations of request header.
   for SD_OP_GET_CLUSTER_DEFAULT.
 - Fix coding style problems.
   
   V2:
 - Fix coding style problem (white space).
 - Add members, store_policy and block_size_shift to struct 
SheepdogVdiReq.
 - Initialize request header to use block_size_shift specified by 
user.
   ---
 block/sheepdog.c  |  138 
++---
 include/block/block_int.h |1 +
 2 files changed, 119 insertions(+), 20 deletions(-)
   
   diff --git a/block/sheepdog.c b/block/sheepdog.c
   index be3176f..a43b947 100644
   --- a/block/sheepdog.c
   +++ b/block/sheepdog.c
   @@ -37,6 +37,7 @@
 #define SD_OP_READ_VDIS  0x15
 #define SD_OP_FLUSH_VDI  0x16
 #define SD_OP_DEL_VDI0x17
   +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
   
 #define SD_FLAG_CMD_WRITE0x01
 #define SD_FLAG_CMD_COW  0x02
   @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
 uint32_t base_vdi_id;
 uint8_t copies;
 uint8_t copy_policy;
   -uint8_t reserved[2];
   +uint8_t store_policy;
   +uint8_t block_size_shift;
 uint32_t snapid;
 uint32_t type;
 uint32_t pad[2];
   @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
 uint32_t pad[5];
 } SheepdogVdiRsp;
   
   +typedef struct SheepdogClusterRsp {
   +uint8_t proto_ver;
   +uint8_t opcode;
   +uint16_t flags;
   +uint32_t epoch;
   +uint32_t id;
   +uint32_t data_length;
   +uint32_t result;
   +uint8_t nr_copies;
   +uint8_t copy_policy;
   +uint8_t block_size_shift;
   +uint8_t __pad1;
   +uint32_t __pad2[6];
   +} SheepdogClusterRsp;
   +
 typedef struct SheepdogInode {
 char name[SD_MAX_VDI_LEN];
 char tag[SD_MAX_VDI_TAG_LEN];
   @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, 
   uint32_t *vdi_id, int snapshot,
 hdr.vdi_size = s-inode.vdi_size;
 hdr.copy_policy = s-inode.copy_policy;
 hdr.copies = s-inode.nr_copies;
   +hdr.block_size_shift = s-inode.block_size_shift;
   
 ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, 
wlen, rlen);
   
   @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, 
   uint32_t *vdi_id, int snapshot,
 static int sd_prealloc(const char *filename, Error **errp)
 {
 BlockDriverState *bs = NULL;
   +BDRVSheepdogState *base = NULL;
   +unsigned long buf_size;
 uint32_t idx, max_idx;
   +uint32_t object_size;
 int64_t vdi_size;
   -void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
   +void *buf = NULL;
 int ret;
   
 ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
BDRV_O_PROTOCOL,
   @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, 
   Error **errp)
 ret = vdi_size;
 goto out;
 }
   -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
   +
   +base = bs-opaque;
   +object_size = (UINT32_C(1)  base-inode.block_size_shift);
   +buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
   +buf = g_malloc0(buf_size);
   +
   +max_idx = DIV_ROUND_UP(vdi_size, buf_size);
   
 for (idx = 0; idx  max_idx; idx++) {
 /*
  * The created image can be a cloned image, so we need to 
read
  * a data from the source image.
  */
   -ret = bdrv_pread(bs, idx 

Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots

2015-02-11 Thread Hitoshi Mitake
At Thu, 12 Feb 2015 14:38:37 +0800,
Liu Yuan wrote:
 
 On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote:
  Current dog vdi snapshot command creates a new snapshot
  unconditionally, even if a working VDI doesn't have its own
  objects. In such a case, the created snapshot is redundant because
  same VDI is already existing.
 
 What kind of use case will create two identical snapshots? This logic is 
 simple
 and code is clean, but I doubt if there is real users of this option.

Generally speaking, taking snapshot periodically is an ordinal usecase
of enterprise SAN. Of course sheepdog can support this use case. In a
case of sheepdog, making cron job (e.g. daily) which invokes dog vdi
snapshot simply enables it.

But if a VDI doesn't have COWed objects, the snapshot will be
redundant. So I want to add this option.

Of course, vdi list will provide information about the COWed object
(used field). But vdi list is heavy operation in a cluster which has
many VDIs because it issues bunch of read requests for inode
headers. So avoiding vdi listing as much as possible is fine.

Thanks,
Hitoshi

 
  
  This patch adds a new option -R to the dog command for reducing
  the identical snapshots.
  
  Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
  ---
   dog/vdi.c | 48 +++-
   1 file changed, 47 insertions(+), 1 deletion(-)
  
  diff --git a/dog/vdi.c b/dog/vdi.c
  index 8e612af..ee465c2 100644
  --- a/dog/vdi.c
  +++ b/dog/vdi.c
  @@ -40,6 +40,8 @@ static struct sd_option vdi_options[] = {
 neither comparing nor repairing},
  {'z', block_size_shift, true, specify the bit shift num for
  data object size},
  +   {'R', reduce-identical-snapshots, false, do not create snapshot if 
  +working VDI doesn't have its own objects},
  { 0, NULL, false, NULL },
   };
   
  @@ -61,6 +63,7 @@ static struct vdi_cmd_data {
  uint64_t oid;
  bool no_share;
  bool exist;
  +   bool reduce_identical_snapshots;
   } vdi_cmd_data = { ~0, };
   
   struct get_vdi_info {
  @@ -605,6 +608,31 @@ fail:
  return NULL;
   }
   
  +static bool has_own_objects(uint32_t vid, int *ret)
 
 Traditionally, we'll have functions return SD_RES_xxx because in this way we
 could propragate the ret to upper callers.
 
 So it is better to have has_own_objects return SD_RES_xxx for consistency.
 
 Thanks,
 Yuan
 -- 
 sheepdog mailing list
 sheepdog@lists.wpkg.org
 https://lists.wpkg.org/mailman/listinfo/sheepdog
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH 1/2] dog: add a new option for reducing identical snapshots

2015-02-11 Thread Liu Yuan
On Thu, Feb 12, 2015 at 03:59:51PM +0900, Hitoshi Mitake wrote:
 At Thu, 12 Feb 2015 14:38:37 +0800,
 Liu Yuan wrote:
  
  On Mon, Feb 09, 2015 at 05:25:48PM +0900, Hitoshi Mitake wrote:
   Current dog vdi snapshot command creates a new snapshot
   unconditionally, even if a working VDI doesn't have its own
   objects. In such a case, the created snapshot is redundant because
   same VDI is already existing.
  
  What kind of use case will create two identical snapshots? This logic is 
  simple
  and code is clean, but I doubt if there is real users of this option.
 
 Generally speaking, taking snapshot periodically is an ordinal usecase
 of enterprise SAN. Of course sheepdog can support this use case. In a
 case of sheepdog, making cron job (e.g. daily) which invokes dog vdi
 snapshot simply enables it.
 
 But if a VDI doesn't have COWed objects, the snapshot will be
 redundant. So I want to add this option.

Okay, your patch makes sense for periodic snapshot. But if dog have found
identical snapshots, it won't create a new one and return success to the caller.

I assume the caller is some middleware, if there is no new vdi returned, will
this cause trouble for it? This means it will need to call 'vdi list' to check
if new vdi created or not? 

I'm not agasint this patch, but I have some questions. For identical snapshots,
the overhead is just an inode object created, no? Looks to me the overhead is
quite small and no need a special option to remove it.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH] sheepdog:show more detail of the crash source

2015-02-11 Thread Hitoshi Mitake
At Wed, 11 Feb 2015 21:47:41 -0800,
Wang Zhengyong wrote:
 
 In the current sheepdog, when sheepdog crashed,
 there is too little information about the signal source.
 
 This patch use (*handler)(int, siginfo_t *, void *)
 instead of (*handler)(int). In this way, can show more detail of the
 crash problem, especially the pid of singal sender
 
 Cc: Hitoshi Mitake mitake.hito...@gmail.com
 Signed-off-by: Wang Zhengyong wangzhengy...@cmss.chinamobile.com
 ---
  dog/dog.c   |2 +-
  include/util.h  |4 ++--
  lib/logger.c|4 ++--
  lib/util.c  |   11 +++
  sheep/sheep.c   |9 +
  shepherd/shepherd.c |2 +-
  6 files changed, 18 insertions(+), 14 deletions(-)
 
 diff --git a/dog/dog.c b/dog/dog.c
 index 54520dd..77aa27b 100644
 --- a/dog/dog.c
 +++ b/dog/dog.c
 @@ -368,7 +368,7 @@ static const struct sd_option *build_sd_options(const 
 char *opts)
   return sd_opts;
  }
  
 -static void crash_handler(int signo)
 +static void crash_handler(int signo, siginfo_t *info, void *context)
  {
   sd_err(dog exits unexpectedly (%s)., strsignal(signo));
  
 diff --git a/include/util.h b/include/util.h
 index 6a513e0..3c34b40 100644
 --- a/include/util.h
 +++ b/include/util.h
 @@ -108,8 +108,8 @@ int rmdir_r(const char *dir_path);
  int purge_directory(const char *dir_path);
  bool is_numeric(const char *p);
  const char *data_to_str(void *data, size_t data_length);
 -int install_sighandler(int signum, void (*handler)(int), bool once);
 -int install_crash_handler(void (*handler)(int));
 +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void 
 *), bool once);
 +int install_crash_handler(void (*handler)(int, siginfo_t *, void *));
  void reraise_crash_signal(int signo, int status);
  pid_t gettid(void);
  int tkill(int tid, int sig);
 diff --git a/lib/logger.c b/lib/logger.c
 index 02bab00..da0ebac 100644
 --- a/lib/logger.c
 +++ b/lib/logger.c
 @@ -531,7 +531,7 @@ static bool is_sheep_dead(int signo)
   return signo == SIGHUP;
  }
  
 -static void crash_handler(int signo)
 +static void crash_handler(int signo, siginfo_t *info, void *context)
  {
   if (is_sheep_dead(signo))
   sd_err(sheep pid %d exited unexpectedly., sheep_pid);
 @@ -552,7 +552,7 @@ static void crash_handler(int signo)
   reraise_crash_signal(signo, 1);
  }
  
 -static void sighup_handler(int signo)
 +static void sighup_handler(int signo, siginfo_t *info, void *context)
  {
   rotate_log();
  }
 diff --git a/lib/util.c b/lib/util.c
 index 21e0143..089455d 100644
 --- a/lib/util.c
 +++ b/lib/util.c
 @@ -524,25 +524,28 @@ const char *data_to_str(void *data, size_t data_length)
   * If 'once' is true, the signal will be restored to the default state
   * after 'handler' is called.
   */
 -int install_sighandler(int signum, void (*handler)(int), bool once)
 +int install_sighandler(int signum, void (*handler)(int, siginfo_t *, void 
 *), bool once)
  {
   struct sigaction sa = {};
  
   sa.sa_handler = handler;

The handler is now a function which should be called via
sa.sa_sigaction. You need to assign handler to the sa_sigaction.

Other part looks good to me.

Thanks,
Hitoshi

 + sa.sa_flags = SA_SIGINFO;
 +
   if (once)
 - sa.sa_flags = SA_RESETHAND | SA_NODEFER;
 + sa.sa_flags = sa.sa_flags | SA_RESETHAND | SA_NODEFER;
   sigemptyset(sa.sa_mask);
  
   return sigaction(signum, sa, NULL);
  }
  
 -int install_crash_handler(void (*handler)(int))
 +int install_crash_handler(void (*handler)(int, siginfo_t *, void *))
  {
   return install_sighandler(SIGSEGV, handler, true) ||
   install_sighandler(SIGABRT, handler, true) ||
   install_sighandler(SIGBUS, handler, true) ||
   install_sighandler(SIGILL, handler, true) ||
 - install_sighandler(SIGFPE, handler, true);
 + install_sighandler(SIGFPE, handler, true) ||
 + install_sighandler(SIGQUIT, handler, true);
  }
  
  /*
 diff --git a/sheep/sheep.c b/sheep/sheep.c
 index e0a034f..6c540ae 100644
 --- a/sheep/sheep.c
 +++ b/sheep/sheep.c
 @@ -239,7 +239,7 @@ static void signal_handler(int listen_fd, int events, 
 void *data)
  
   ret = read(sigfd, siginfo, sizeof(siginfo));
   assert(ret == sizeof(siginfo));
 - sd_debug(signal %d, siginfo.ssi_signo);
 + sd_debug(signal %d, ssi pid %d, siginfo.ssi_signo, siginfo.ssi_pid);
   switch (siginfo.ssi_signo) {
   case SIGTERM:
   sys-cinfo.status = SD_STATUS_KILLED;
 @@ -276,9 +276,10 @@ static int init_signal(void)
   return 0;
  }
  
 -static void crash_handler(int signo)
 +static void crash_handler(int signo, siginfo_t *info, void *context)
  {
 - sd_emerg(sheep exits unexpectedly (%s)., strsignal(signo));
 + sd_emerg(sheep exits unexpectedly (%s), si pid %d, uid %d, errno %d, 
 code %d,
 + strsignal(signo), info-si_pid, info-si_uid, info-si_errno, 
 info-si_code);
  
   

[sheepdog] [PATCH] zookeeper: add more detailed description on how zk_watcher report states

2015-02-11 Thread Liu Yuan
From: Liu Yuan liuy...@cmss.chinamobile.com

Signed-off-by: Liu Yuan liuy...@cmss.chinamobile.com
---
 sheep/cluster/zookeeper.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
index b603d36..e2ee248 100644
--- a/sheep/cluster/zookeeper.c
+++ b/sheep/cluster/zookeeper.c
@@ -685,6 +685,27 @@ static int add_event(enum zk_event_type type, struct 
zk_node *znode, void *buf,
}
 }
 
+/*
+ * Type value:
+ * -1 SESSION_EVENT, use State to indicate what kind of sub-event
+ *   State value:
+ *   -122 SESSION EXPIRED
+ *   1CONNECTING
+ *   3CONNECTED
+ * 1  CREATED_EVENT
+ * 2  DELETED_EVENT
+ * 3  CHANGED_EVENT
+ * 4  CHILD_EVENT
+ *
+ * While connection to zk is disconnected (zk cluster in election, network is
+ * broken, etc.), zk library will try to reconnect zk cluster on its own and
+ * report both the connection state and session state to sheep via zk_watcher.
+ *
+ * Once the connection is reestablished (state changed from 1 to 3)within the
+ * timeout window, the session is still valid, meaning that all the watchers
+ * will function as before. If not within the timeout window, zk_watcher will
+ * report to sheep that session is expired.
+ */
 static void zk_watcher(zhandle_t *zh, int type, int state, const char *path,
   void *ctx)
 {
@@ -693,6 +714,8 @@ static void zk_watcher(zhandle_t *zh, int type, int state, 
const char *path,
uint64_t lock_id;
int ret;
 
+   sd_debug(path:%s, type:%d, state:%d, path, type, state);
+
if (type == ZOO_SESSION_EVENT  state == ZOO_EXPIRED_SESSION_STATE) {
/*
 * do reconnect in main thread to avoid on-the-fly zookeeper
@@ -702,8 +725,6 @@ static void zk_watcher(zhandle_t *zh, int type, int state, 
const char *path,
return;
}
 
-/* CREATED_EVENT 1, DELETED_EVENT 2, CHANGED_EVENT 3, CHILD_EVENT 4 */
-   sd_debug(path:%s, type:%d, state:%d, path, type, state);
if (type == ZOO_CREATED_EVENT || type == ZOO_CHANGED_EVENT) {
ret = sscanf(path, MEMBER_ZNODE /%s, str);
if (ret == 1)
-- 
1.9.1

-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] obbject placement

2015-02-11 Thread Liu Yuan
On Sun, Feb 08, 2015 at 08:46:01PM +0100, Corin Langosch wrote:
 Hi guys,
 
 afaik sheepdog uses consistent hashing to map objects to nodes. But how do 
 you choose where the individual ec-chunks of
 an object should go? Consistent hashing cannot be used here because different 
 ec-chunks of the same object might get
 mapped to the same node.
 
 I looked at the sources and got an idea: you calculate a base node for the 
 object and then for each ec-chunk you add
 its index. Example: we have 10 nodes (0..9), consistent hashing calculates 
 node 7 for our object. As we have 4:2
 encoding, we get the ec-chunks stored on [7,8,9,0,1,2]?
 
 Is my assumption correct, or how do you actually do it?
 

Yes, you are right if we don't consider the virtual nodes added to consistent
hashing to mitigate the object migration problem.

But with virtual nodes in the picture, we can add one more rule to make sure
all the data and ec-chunks are on the different physical nodes.

Thanks
Yuan

-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Liu Yuan
On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote:
 Hi! I need to store user backups and allows to download it. I see in
 google that sheepdog supports deduplication, but can't find info in
 sheepdog docs about it. Does sheepdog support deduplication?

This deduplication is for SD's internal use, to store its own cluster snapshot.

 Also i think not use cluster wide snapshots because i want to dedicate
 backup server from other sheepdog nodes, so i need to copy user vdi
 from compute node to backup node. Does somebody can say how can i do
 that in optimal way?
 
 Thanks!

How about sheepdog's RESTful storage to store user's backup and downloading?

More detail see

https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


[sheepdog] effective storing backups and deduplication

2015-02-11 Thread Vasiliy Tolstov
Hi! I need to store user backups and allows to download it. I see in
google that sheepdog supports deduplication, but can't find info in
sheepdog docs about it. Does sheepdog support deduplication?

Also i think not use cluster wide snapshots because i want to dedicate
backup server from other sheepdog nodes, so i need to copy user vdi
from compute node to backup node. Does somebody can say how can i do
that in optimal way?

Thanks!

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Vasiliy Tolstov
2015-02-11 15:08 GMT+03:00 Liu Yuan namei.u...@gmail.com:
 On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote:
 Hi! I need to store user backups and allows to download it. I see in
 google that sheepdog supports deduplication, but can't find info in
 sheepdog docs about it. Does sheepdog support deduplication?

 This deduplication is for SD's internal use, to store its own cluster 
 snapshot.


So, if i have nearly identicl backups (for example 5Gb of data each
and only 1Gb is different) space needed for two backups equal 10Gb?
How much work needed for vdi deduplication?

 Also i think not use cluster wide snapshots because i want to dedicate
 backup server from other sheepdog nodes, so i need to copy user vdi
 from compute node to backup node. Does somebody can say how can i do
 that in optimal way?

 Thanks!

 How about sheepdog's RESTful storage to store user's backup and downloading?

 More detail see

 https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage

 Thanks

Yes i think about it and now i'm try to understand how to add
authentication and other needed stuff.


-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Vasiliy Tolstov
2015-02-11 15:28 GMT+03:00 Liu Yuan namei.u...@gmail.com:
 We need to what is user's backups. Is it the whole vdi or dalta data for
 different vdis?


Best scheme as i think is:
1) If backup not exists for vdi - create full backup (this is simple
copy all data)
2) If backup already created - create new backup and copy only delta
from previous backup.
3) If use delete old backup - remove garbage pieces that not belongs
to other vdi.
4) In case of steps from 1 to 2 - check other vdi pieces for duplicate
data and store only difference. But i think this is very problematic
in this case.

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Liu Yuan
On Wed, Feb 11, 2015 at 04:32:34PM +0400, Vasiliy Tolstov wrote:
 2015-02-11 15:28 GMT+03:00 Liu Yuan namei.u...@gmail.com:
  We need to what is user's backups. Is it the whole vdi or dalta data for
  different vdis?
 
 
 Best scheme as i think is:
 1) If backup not exists for vdi - create full backup (this is simple
 copy all data)
 2) If backup already created - create new backup and copy only delta
 from previous backup.
 3) If use delete old backup - remove garbage pieces that not belongs
 to other vdi.
 4) In case of steps from 1 to 2 - check other vdi pieces for duplicate
 data and store only difference. But i think this is very problematic
 in this case.

This scheme can build on the sheepdog's current features:

0 use qemu-img (recommenced because better performance) or dog to read the base
  vdi.

1 use dog to backup the delta data for different snapshots takben by
  qemu-img snapshot or dog vdi snapshot.
 
2 manage the delta data and the base for the user defined snapshots relations
  by the upper layer

3 use SD http storage to store the base and delta data.

I guess you need something as the middle layer to map the user defined snapshots
to sheepdog's base and delta data and implement gc in this middle layer.
Authentication would be better implementated in this middleware.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] effective storing backups and deduplication

2015-02-11 Thread Liu Yuan
On Wed, Feb 11, 2015 at 04:14:35PM +0400, Vasiliy Tolstov wrote:
 2015-02-11 15:08 GMT+03:00 Liu Yuan namei.u...@gmail.com:
  On Wed, Feb 11, 2015 at 03:57:32PM +0400, Vasiliy Tolstov wrote:
  Hi! I need to store user backups and allows to download it. I see in
  google that sheepdog supports deduplication, but can't find info in
  sheepdog docs about it. Does sheepdog support deduplication?
 
  This deduplication is for SD's internal use, to store its own cluster 
  snapshot.
 
 
 So, if i have nearly identicl backups (for example 5Gb of data each
 and only 1Gb is different) space needed for two backups equal 10Gb?
 How much work needed for vdi deduplication?

We need to what is user's backups. Is it the whole vdi or dalta data for
different vdis?

Cluster snapshot will snapshot the whole cluster and store it in a deduplicated
way, I don't think it is what you need.

 
  Also i think not use cluster wide snapshots because i want to dedicate
  backup server from other sheepdog nodes, so i need to copy user vdi
  from compute node to backup node. Does somebody can say how can i do
  that in optimal way?
 
  Thanks!
 
  How about sheepdog's RESTful storage to store user's backup and downloading?
 
  More detail see
 
  https://github.com/sheepdog/sheepdog/wiki/HTTP-Simple-Storage
 
  Thanks
 
 Yes i think about it and now i'm try to understand how to add
 authentication and other needed stuff.

You can reference the openstack's swift implementation. But feel free to choose
what you think reasonale for authentication implementation for sheepdog.

Thanks
Yuan
-- 
sheepdog mailing list
sheepdog@lists.wpkg.org
https://lists.wpkg.org/mailman/listinfo/sheepdog


Re: [sheepdog] [PATCH v4] sheepdog: selectable object size support

2015-02-11 Thread Hitoshi Mitake
At Tue, 10 Feb 2015 18:35:58 +0800,
Liu Yuan wrote:
 
 On Tue, Feb 10, 2015 at 06:56:33PM +0900, Teruaki Ishizaki wrote:
  (2015/02/10 17:58), Liu Yuan wrote:
  On Tue, Feb 10, 2015 at 05:22:02PM +0900, Teruaki Ishizaki wrote:
  (2015/02/10 12:10), Liu Yuan wrote:
  On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
  Previously, qemu block driver of sheepdog used hard-coded VDI object 
  size.
  This patch enables users to handle block_size_shift value for
  calculating VDI object size.
  
  When you start qemu, you don't need to specify additional command 
  option.
  
  But when you create the VDI which doesn't have default object size
  with qemu-img command, you specify block_size_shift option.
  
  If you want to create a VDI of 8MB(1  23) object size,
  you need to specify following command option.
  
# qemu-img create -o block_size_shift=23 sheepdog:test1 100M
  
  In addition, when you don't specify qemu-img command option,
  a default value of sheepdog cluster is used for creating VDI.
  
# qemu-img create sheepdog:test2 100M
  
  Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp
  ---
  V4:
- Limit a read/write buffer size for creating a preallocated VDI.
- Replace a parse function for the block_size_shift option.
- Fix an error message.
  
  V3:
- Delete the needless operation of buffer.
- Delete the needless operations of request header.
  for SD_OP_GET_CLUSTER_DEFAULT.
- Fix coding style problems.
  
  V2:
- Fix coding style problem (white space).
- Add members, store_policy and block_size_shift to struct 
   SheepdogVdiReq.
- Initialize request header to use block_size_shift specified by user.
  ---
block/sheepdog.c  |  138 
   ++---
include/block/block_int.h |1 +
2 files changed, 119 insertions(+), 20 deletions(-)
  
  diff --git a/block/sheepdog.c b/block/sheepdog.c
  index be3176f..a43b947 100644
  --- a/block/sheepdog.c
  +++ b/block/sheepdog.c
  @@ -37,6 +37,7 @@
#define SD_OP_READ_VDIS  0x15
#define SD_OP_FLUSH_VDI  0x16
#define SD_OP_DEL_VDI0x17
  +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
  
#define SD_FLAG_CMD_WRITE0x01
#define SD_FLAG_CMD_COW  0x02
  @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
uint32_t base_vdi_id;
uint8_t copies;
uint8_t copy_policy;
  -uint8_t reserved[2];
  +uint8_t store_policy;
  +uint8_t block_size_shift;
uint32_t snapid;
uint32_t type;
uint32_t pad[2];
  @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
uint32_t pad[5];
} SheepdogVdiRsp;
  
  +typedef struct SheepdogClusterRsp {
  +uint8_t proto_ver;
  +uint8_t opcode;
  +uint16_t flags;
  +uint32_t epoch;
  +uint32_t id;
  +uint32_t data_length;
  +uint32_t result;
  +uint8_t nr_copies;
  +uint8_t copy_policy;
  +uint8_t block_size_shift;
  +uint8_t __pad1;
  +uint32_t __pad2[6];
  +} SheepdogClusterRsp;
  +
typedef struct SheepdogInode {
char name[SD_MAX_VDI_LEN];
char tag[SD_MAX_VDI_TAG_LEN];
  @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, 
  uint32_t *vdi_id, int snapshot,
hdr.vdi_size = s-inode.vdi_size;
hdr.copy_policy = s-inode.copy_policy;
hdr.copies = s-inode.nr_copies;
  +hdr.block_size_shift = s-inode.block_size_shift;
  
ret = do_req(fd, s-aio_context, (SheepdogReq *)hdr, buf, wlen, 
   rlen);
  
  @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, 
  uint32_t *vdi_id, int snapshot,
static int sd_prealloc(const char *filename, Error **errp)
{
BlockDriverState *bs = NULL;
  +BDRVSheepdogState *base = NULL;
  +unsigned long buf_size;
uint32_t idx, max_idx;
  +uint32_t object_size;
int64_t vdi_size;
  -void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
  +void *buf = NULL;
int ret;
  
ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
   BDRV_O_PROTOCOL,
  @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, 
  Error **errp)
ret = vdi_size;
goto out;
}
  -max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
  +
  +base = bs-opaque;
  +object_size = (UINT32_C(1)  base-inode.block_size_shift);
  +buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
  +buf = g_malloc0(buf_size);
  +
  +max_idx = DIV_ROUND_UP(vdi_size, buf_size);
  
for (idx = 0; idx  max_idx; idx++) {
/*
 * The created image can be a cloned image, so we need to read
 * a data from the source image.
 */
  -ret = bdrv_pread(bs, idx * SD_DATA_OBJ_SIZE, buf, 
  SD_DATA_OBJ_SIZE);
  +ret = bdrv_pread(bs, idx * buf_size, buf, buf_size);
if (ret  0) {
goto out;
}
  -ret = bdrv_pwrite(bs, idx *