endless flying slow requests

2012-11-14 Thread Stefan Priebe - Profihost AG

Hello list,

i see this several times. Endless flying slow requests. And they never 
stop until i restart the mentioned osd.


2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for  31789.858457 secs
2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for  31790.858646 secs
2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for  31791.858827 secs
2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for  31792.858971 secs
2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed
2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included 
below; oldest blocked for  31793.859124 secs
2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 
seconds old, received at 2012-11-14 01:22:07.654922: 
osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 
282624~4096] 3.3f6d2373) v4 currently delayed


When i now restart osd 24 they go away and everything is fine again.

Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Help] Use Ceph RBD as primary storage in CloudStack 4.0

2012-11-14 Thread Alex Jiang
Hi, Dan
Thank you for your reply.After installing ceph, I can compile Qemu
with RBD enable and have added the host to CloudStack successfully.

2012/11/14 Dan Mick dan.m...@inktank.com:
 Hi Alex:

 did you install the ceph packages before trying to build qemu?  It sounds
 like qemu is looking for the Ceph libraries and not finding them.


 On 11/12/2012 09:38 PM, Alex Jiang wrote:

 Hi, All

 Has somebody used Ceph RBD in CloudStack as primary storage? I see
 that in the new features of CS 4.0, RBD is supported for KVM. So I
 tried using RBD as primary storage but met with some problems.

 I use a CentOS6.3 server as host. First I erase the qemu-kvm(0.12.1)
 and libvirt(0.9.10) because their versions are too low (Qemu on the
 Hypervisor has to be compiled with RBD enabled .The libvirt version on
 the Hypervisor has to be at least 0.10 with RBD enabled).Then I
 download the latest qemu(1.2.0) and libvirt(1.0.0) source code and
 compile and install them. But when compiling qemu source code,

 #wget http://wiki.qemu-project.org/download/qemu-1.2.0.tar.bz2
 #tar jxvf qemu-1.2.0.tar.bz2
 # cd qemu-1.2.0
 # ./configure --enable-rbd

 the following errors occur:
 ERROR: User requested feature rados block device
 ERROR: configure was not able to find it

 But on Ubuntu12.04 I tried compiling qemu source code and succeed.Now
 I am very confused.How to use Ceph RBD as primary storage in
 CloudStack on CentOS6.3?Anyone can help me?

 Best Regards,

   Alex
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph cluster hangs when rebooting one node

2012-11-14 Thread Aleksey Samarin
Hello!

I have the same problem. After switching off the second node, the
cluster hangs, there is some solution?

All the best, Alex!

2012/11/12 Stefan Priebe - Profihost AG s.pri...@profihost.ag:
 Am 12.11.2012 16:11, schrieb Sage Weil:

 On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote:

 Hello list,

 i was checking what happens if i reboot a ceph node.

 Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is
 possible.


 If you are using the current master, the new 'min_size' may be biting you;
 ceph osd dump | grep ^pool and see if you see min_size for your pools.
 You can change that back to the norma behavior with


 No i don't see any min size:

 # ceph osd dump | grep ^pool
 pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344
 pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45
 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num
 1344 pgp_num 1344 last_change 1 owner 0
 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344
 pgp_num 1344 last_change 1 owner 0
 pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
 3000 pgp_num 3000 last_change 958 owner 0


   ceph osd pool set poolname min_size 1

 Yes this helps! But min_size is still not shown in ceph osd dump. Also when
 i reboot a node it takes up to 10s-20s until all osds from this node are set
 to failed and the I/O starts again. Should i issue an ceph osd out command
 before?

 But i had already this set for all my rules in my crushmap
 min_size 1
 max_size 2

 in my crushmap for each rule.


 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Authorization issues in the 0.54

2012-11-14 Thread Andrey Korolyov
Hi,
In the 0.54 cephx is probably broken somehow:

$ ceph auth add client.qemukvm osd 'allow *' mon 'allow *' mds 'allow
*' -i qemukvm.key
2012-11-14 15:51:23.153910 7ff06441f780 -1 read 65 bytes from qemukvm.key
added key for client.qemukvm

$ ceph auth list
...
client.admin
key: [xx]
caps: [mds] allow *
caps: [mon] allow *
caps: [osd] allow *
client.qemukvm
key: [yy]
caps: [mds] allow *
caps: [mon] allow *
caps: [osd] allow *
...
$ virsh secret-set-value --secret uuid --base64 yy
set username in the VM` xml...
$ virsh start testvm
kvm: -drive 
file=rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789,if=none,id=drive-virtio-disk0,format=raw:
could not open disk image
rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789:
Operation not permitted
$ virsh secret-set-value --secret uuid --base64 xx
set username again to admin for the VM` disk
$ virsh start testvm
Finally, vm started successfully.

All rbd commands issued from cli works okay with the appropriate
credentials, qemu binary was linked with same librbd as running one.
Does anyone have a suggestion?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem with ceph and btrfs patch: set journal_info in async trans commit worker

2012-11-14 Thread Stefan Priebe - Profihost AG

Hello list,

i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was 
seeing a massive performance degration. I see around 22x 
btrfs-endio-write processes every 10-20 seconds and they run a long time 
while consuming a massive amount of CPU.


So my performance of 23.000 iops drops to an up and down of 23.000 iops 
to 0 - avg is now 2500 iops instead of 23.000.


Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe 
Btrfs: set journal_info in async trans commit worker as the 
problematic patch.


When i revert this one everything is fine again.

Is this known?

Greets,
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: endless flying slow requests

2012-11-14 Thread Sage Weil
Hi Stefan,

I would be nice to confirm that no clients are waiting on replies for 
these requests; currently we suspect that the OSD request tracking is the 
buggy part.  If you query the OSD admin socket you should be able to dump 
requests and see the client IP, and then query the client.  

Is it librbd?  In that case you likely need to change the config so that 
it is listening on an admin socket ('admin socket = path').

Thanks!
sage


On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote:

 Hello list,
 
 i see this several times. Endless flying slow requests. And they never stop
 until i restart the mentioned osd.
 
 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below;
 oldest blocked for  31789.858457 secs
 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds old,
 received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4
 currently delayed
 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below;
 oldest blocked for  31790.858646 secs
 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds old,
 received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4
 currently delayed
 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below;
 oldest blocked for  31791.858827 secs
 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds old,
 received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4
 currently delayed
 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below;
 oldest blocked for  31792.858971 secs
 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds old,
 received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4
 currently delayed
 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below;
 oldest blocked for  31793.859124 secs
 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds old,
 received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719
 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4
 currently delayed
 
 When i now restart osd 24 they go away and everything is fine again.
 
 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph cluster hangs when rebooting one node

2012-11-14 Thread Sage Weil
On Wed, 14 Nov 2012, Aleksey Samarin wrote:
 Hello!
 
 I have the same problem. After switching off the second node, the
 cluster hangs, there is some solution?
 
 All the best, Alex!

I suspect this is min_size; the latest master has a few changes and also 
will print it out so you can tell what is going on.

min_size is the minimum number of replicas before the OSDs will go active 
(handle reads/writes).  Setting it to 1 gets you old behavior, while 
increasing it protects you from cases where writes to a single replica 
that then fails will force the admin to make a difficult decision about 
losing data.

You can adjust with

 ceph osd pool set pool name min_size value

sage

 
 2012/11/12 Stefan Priebe - Profihost AG s.pri...@profihost.ag:
  Am 12.11.2012 16:11, schrieb Sage Weil:
 
  On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote:
 
  Hello list,
 
  i was checking what happens if i reboot a ceph node.
 
  Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is
  possible.
 
 
  If you are using the current master, the new 'min_size' may be biting you;
  ceph osd dump | grep ^pool and see if you see min_size for your pools.
  You can change that back to the norma behavior with
 
 
  No i don't see any min size:
 
  # ceph osd dump | grep ^pool
  pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344
  pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45
  pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num
  1344 pgp_num 1344 last_change 1 owner 0
  pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344
  pgp_num 1344 last_change 1 owner 0
  pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
  3000 pgp_num 3000 last_change 958 owner 0
 
 
ceph osd pool set poolname min_size 1
 
  Yes this helps! But min_size is still not shown in ceph osd dump. Also when
  i reboot a node it takes up to 10s-20s until all osds from this node are set
  to failed and the I/O starts again. Should i issue an ceph osd out command
  before?
 
  But i had already this set for all my rules in my crushmap
  min_size 1
  max_size 2
 
  in my crushmap for each rule.
 
 
  Stefan
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] libceph: always init trail for osd requests

2012-11-14 Thread Alex Elder
This series makes the ceph_osd_request-r_trail be a
structure that's always initialized rather than a pointer.
The result works equivalent to before but it makes things
simpler.

-Alex

[PATCH 1/2] libceph: always allow trail in osd request
[PATCH 2/2] libceph: kill op_needs_trail()
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] libceph: always allow trail in osd request

2012-11-14 Thread Alex Elder
An osd request structure contains an optional trail portion, which
if present will contain data to be passed in the payload portion of
the message containing the request.  The trail field is a
ceph_pagelist pointer, and if null it indicates there is no trail.

A ceph_pagelist structure contains a length field, and it can
legitimately hold value 0.  Make use of this to change the
interpretation of the trail of an osd request so that every osd
request has trailing data, it just might have length 0.

This means we change the r_trail field in a ceph_osd_request
structure from a pointer to a structure that is always initialized.

Note that in ceph_osdc_start_request(), the trail pointer (or now
address of that structure) is assigned to a ceph message's trail
field.  Here's why that's still OK (looking at net/ceph/messenger.c):
- What would have resulted in a null pointer previously will now
  refer to a 0-length page list.  That message trail pointer
  is used in two functions, write_partial_msg_pages() and
  out_msg_pos_next().
- In write_partial_msg_pages(), a null page list pointer is
  handled the same as a message with 0-length trail, and both
  result in a in_trail variable set to false.  The trail
  pointer is only used if in_trail is true.
- The only other place the message trail pointer is used is
  out_msg_pos_next().  That function is only called by
  write_partial_msg_pages() and only touches the trail pointer
  if the in_trail value it is passed is true.
Therefore a null ceph_msg-trail pointer is equivalent to a non-null
pointer referring to a 0-length page list structure.

Signed-off-by: Alex Elder el...@inktank.com
---
 include/linux/ceph/osd_client.h |4 ++--
 net/ceph/osd_client.c   |   43
+++
 2 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index f2e5d2c..61562c7 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -10,6 +10,7 @@
 #include linux/ceph/osdmap.h
 #include linux/ceph/messenger.h
 #include linux/ceph/auth.h
+#include linux/ceph/pagelist.h

 /*
  * Maximum object name size
@@ -22,7 +23,6 @@ struct ceph_snap_context;
 struct ceph_osd_request;
 struct ceph_osd_client;
 struct ceph_authorizer;
-struct ceph_pagelist;

 /*
  * completion callback for async writepages
@@ -95,7 +95,7 @@ struct ceph_osd_request {
struct bio   *r_bio;  /* instead of pages */
 #endif

-   struct ceph_pagelist *r_trail;/* trailing part of the data */
+   struct ceph_pagelist r_trail; /* trailing part of the data */
 };

 struct ceph_osd_event {
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 540276e..15984d2 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -163,10 +163,7 @@ void ceph_osdc_release_request(struct kref *kref)
bio_put(req-r_bio);
 #endif
ceph_put_snap_context(req-r_snapc);
-   if (req-r_trail) {
-   ceph_pagelist_release(req-r_trail);
-   kfree(req-r_trail);
-   }
+   ceph_pagelist_release(req-r_trail);
if (req-r_mempool)
mempool_free(req, req-r_osdc-req_mempool);
else
@@ -200,8 +197,7 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
 {
struct ceph_osd_request *req;
struct ceph_msg *msg;
-   int needs_trail;
-   int num_op = get_num_ops(ops, needs_trail);
+   int num_op = get_num_ops(ops, NULL);
size_t msg_size = sizeof(struct ceph_osd_request_head);

msg_size += num_op*sizeof(struct ceph_osd_op);
@@ -244,15 +240,7 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
}
req-r_reply = msg;

-   /* allocate space for the trailing data */
-   if (needs_trail) {
-   req-r_trail = kmalloc(sizeof(struct ceph_pagelist), gfp_flags);
-   if (!req-r_trail) {
-   ceph_osdc_put_request(req);
-   return NULL;
-   }
-   ceph_pagelist_init(req-r_trail);
-   }
+   ceph_pagelist_init(req-r_trail);

/* create request message; allow space for oid */
msg_size += MAX_OBJ_NAME_SIZE;
@@ -304,29 +292,25 @@ static void osd_req_encode_op(struct
ceph_osd_request *req,
case CEPH_OSD_OP_GETXATTR:
case CEPH_OSD_OP_SETXATTR:
case CEPH_OSD_OP_CMPXATTR:
-   BUG_ON(!req-r_trail);
-
dst-xattr.name_len = cpu_to_le32(src-xattr.name_len);
dst-xattr.value_len = cpu_to_le32(src-xattr.value_len);
dst-xattr.cmp_op = src-xattr.cmp_op;
dst-xattr.cmp_mode = src-xattr.cmp_mode;
-   ceph_pagelist_append(req-r_trail, src-xattr.name,
+   ceph_pagelist_append(req-r_trail, src-xattr.name,
   

[PATCH 2/2] libceph: kill op_needs_trail()

2012-11-14 Thread Alex Elder
Since every osd message is now prepared to include trailing data,
there's no need to check ahead of time whether any operations will
make use of the trail portion of the message.

We can drop the second argument ot get_num_ops(), and as a result we
can also get rid of op_needs_trail() which is no longer used.

Signed-off-by: Alex Elder el...@inktank.com
---
 net/ceph/osd_client.c |   27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 15984d2..20b7921 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -32,20 +32,6 @@ static void __unregister_linger_request(struct
ceph_osd_client *osdc,
 static void __send_request(struct ceph_osd_client *osdc,
   struct ceph_osd_request *req);

-static int op_needs_trail(int op)
-{
-   switch (op) {
-   case CEPH_OSD_OP_GETXATTR:
-   case CEPH_OSD_OP_SETXATTR:
-   case CEPH_OSD_OP_CMPXATTR:
-   case CEPH_OSD_OP_CALL:
-   case CEPH_OSD_OP_NOTIFY:
-   return 1;
-   default:
-   return 0;
-   }
-}
-
 static int op_has_extent(int op)
 {
return (op == CEPH_OSD_OP_READ ||
@@ -171,17 +157,12 @@ void ceph_osdc_release_request(struct kref *kref)
 }
 EXPORT_SYMBOL(ceph_osdc_release_request);

-static int get_num_ops(struct ceph_osd_req_op *ops, int *needs_trail)
+static int get_num_ops(struct ceph_osd_req_op *ops)
 {
int i = 0;

-   if (needs_trail)
-   *needs_trail = 0;
-   while (ops[i].op) {
-   if (needs_trail  op_needs_trail(ops[i].op))
-   *needs_trail = 1;
+   while (ops[i].op)
i++;
-   }

return i;
 }
@@ -197,7 +178,7 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
 {
struct ceph_osd_request *req;
struct ceph_msg *msg;
-   int num_op = get_num_ops(ops, NULL);
+   int num_op = get_num_ops(ops);
size_t msg_size = sizeof(struct ceph_osd_request_head);

msg_size += num_op*sizeof(struct ceph_osd_op);
@@ -357,7 +338,7 @@ void ceph_osdc_build_request(struct ceph_osd_request
*req,
struct ceph_osd_req_op *src_op;
struct ceph_osd_op *op;
void *p;
-   int num_op = get_num_ops(src_ops, NULL);
+   int num_op = get_num_ops(src_ops);
size_t msg_size = sizeof(*head) + num_op*sizeof(*op);
int flags = req-r_flags;
u64 data_len = 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] libceph: tighten up some interfaces

2012-11-14 Thread Alex Elder
While investigating exactly how and why rbd uses ceph_calc_raw_layout()
I implemented some small changes to some functions to make it obvious
to the caller that certain functions won't cause side-effects, or that
certain functions do or don't need certain parameters.

-Alex

[PATCH 1/4] libceph: pass length to ceph_osdc_build_request()
[PATCH 2/4] libceph: pass length to ceph_calc_file_object_mapping()
[PATCH 3/4] libceph: drop snapid in ceph_calc_raw_layout()
[PATCH 4/4] libceph: drop osdc from ceph_calc_raw_layout()
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] libceph: pass length to ceph_osdc_build_request()

2012-11-14 Thread Alex Elder
The len argument to ceph_osdc_build_request() is set up to be
passed by address, but that function never updates its value
so there's no need to do this.  Tighten up the interface by
passing the length directly.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |2 +-
 include/linux/ceph/osd_client.h |2 +-
 net/ceph/osd_client.c   |6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 9dc1d5f..08d1b6e 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1174,7 +1174,7 @@ static int rbd_do_request(struct request *rq,
snapid, ofs, len, bno, osd_req, ops);
rbd_assert(ret == 0);

-   ceph_osdc_build_request(osd_req, ofs, len, ops, snapc, mtime);
+   ceph_osdc_build_request(osd_req, ofs, len, ops, snapc, mtime);

if (linger_req) {
ceph_osdc_set_request_linger(osdc, osd_req);
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 61562c7..4bfb458 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -224,7 +224,7 @@ extern struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *
   struct bio *bio);

 extern void ceph_osdc_build_request(struct ceph_osd_request *req,
-   u64 off, u64 *plen,
+   u64 off, u64 len,
struct ceph_osd_req_op *src_ops,
struct ceph_snap_context *snapc,
struct timespec *mtime);
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 20b7921..d550d9e 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -328,7 +328,7 @@ static void osd_req_encode_op(struct
ceph_osd_request *req,
  *
  */
 void ceph_osdc_build_request(struct ceph_osd_request *req,
-u64 off, u64 *plen,
+u64 off, u64 len,
 struct ceph_osd_req_op *src_ops,
 struct ceph_snap_context *snapc,
 struct timespec *mtime)
@@ -382,7 +382,7 @@ void ceph_osdc_build_request(struct ceph_osd_request
*req,

if (flags  CEPH_OSD_FLAG_WRITE) {
req-r_request-hdr.data_off = cpu_to_le16(off);
-   req-r_request-hdr.data_len = cpu_to_le32(*plen + data_len);
+   req-r_request-hdr.data_len = cpu_to_le32(len + data_len);
} else if (data_len) {
req-r_request-hdr.data_off = 0;
req-r_request-hdr.data_len = cpu_to_le32(data_len);
@@ -456,7 +456,7 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
req-r_num_pages = calc_pages_for(page_align, *plen);
req-r_page_alignment = page_align;

-   ceph_osdc_build_request(req, off, plen, ops,
+   ceph_osdc_build_request(req, off, *plen, ops,
snapc,
mtime);

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] libceph: pass length to ceph_calc_file_object_mapping()

2012-11-14 Thread Alex Elder
ceph_calc_file_object_mapping() takes (among other things) a file
offset and length, and based on the layout, determines the object
number (bno) backing the affected portion of the file's data and
the offset into that object where the desired range begins.  It also
computes the size that should be used for the request--either the
amount requested or something less if that would exceed the end of
the object.

This patch changes the input length parameter in this function so it
is used only for input.  That is, the argument will be passed by
value rather than by address, so the value provided won't get
updated by the function.

The value would only get updated if the length would surpass the
current object, and in that case the value it got updated to would
be exactly that returned in *oxlen.

Only one of the two callers is affected by this change.  Update
ceph_calc_raw_layout() so it records any updated value.

Signed-off-by: Alex Elder el...@inktank.com
---
 fs/ceph/ioctl.c |2 +-
 include/linux/ceph/osdmap.h |2 +-
 net/ceph/osd_client.c   |6 --
 net/ceph/osdmap.c   |9 -
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 36549a4..3b22150 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -194,7 +194,7 @@ static long ceph_ioctl_get_dataloc(struct file
*file, void __user *arg)
return -EFAULT;

down_read(osdc-map_sem);
-   r = ceph_calc_file_object_mapping(ci-i_layout, dl.file_offset, len,
+   r = ceph_calc_file_object_mapping(ci-i_layout, dl.file_offset, len,
  dl.object_no, dl.object_offset,
  olen);
if (r  0)
diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h
index c841396..9ea98d2 100644
--- a/include/linux/ceph/osdmap.h
+++ b/include/linux/ceph/osdmap.h
@@ -110,7 +110,7 @@ extern void ceph_osdmap_destroy(struct ceph_osdmap
*map);

 /* calculate mapping of a file extent to an object */
 extern int ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
-u64 off, u64 *plen,
+u64 off, u64 len,
 u64 *bno, u64 *oxoff, u64 *oxlen);

 /* calculate mapping of object to a placement group */
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index d550d9e..60c4e15 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -53,13 +53,15 @@ int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
reqhead-snapid = cpu_to_le64(snapid);

/* object extent? */
-   r = ceph_calc_file_object_mapping(layout, off, plen, bno,
+   r = ceph_calc_file_object_mapping(layout, off, orig_len, bno,
  objoff, objlen);
if (r  0)
return r;
-   if (*plen  orig_len)
+   if (objlen  orig_len) {
+   *plen = objlen;
dout( skipping last %llu, final file extent %llu~%llu\n,
 orig_len - *plen, off, *plen);
+   }

if (op_has_extent(op-op)) {
op-extent.offset = objoff;
diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 27e904e..d7baf5d 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -1012,7 +1012,7 @@ bad:
  * pass a stride back to the caller.
  */
 int ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
-  u64 off, u64 *plen,
+  u64 off, u64 len,
   u64 *ono,
   u64 *oxoff, u64 *oxlen)
 {
@@ -1023,7 +1023,7 @@ int ceph_calc_file_object_mapping(struct
ceph_file_layout *layout,
u32 su_per_object;
u64 t, su_offset;

-   dout(mapping %llu~%llu  osize %u fl_su %u\n, off, *plen,
+   dout(mapping %llu~%llu  osize %u fl_su %u\n, off, len,
 osize, su);
if (su == 0 || sc == 0)
goto invalid;
@@ -1056,11 +1056,10 @@ int ceph_calc_file_object_mapping(struct
ceph_file_layout *layout,

/*
 * Calculate the length of the extent being written to the selected
-* object. This is the minimum of the full length requested (plen) or
+* object. This is the minimum of the full length requested (len) or
 * the remainder of the current stripe being written to.
 */
-   *oxlen = min_t(u64, *plen, su - su_offset);
-   *plen = *oxlen;
+   *oxlen = min_t(u64, len, su - su_offset);

dout( obj extent %llu~%llu\n, *oxoff, *oxlen);
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] libceph: drop osdc from ceph_calc_raw_layout()

2012-11-14 Thread Alex Elder
The osdc parameter to ceph_calc_raw_layout() is not used, so get rid
of it.  Consequently, the corresponding parameter in calc_layout()
becomes unused, so get rid of that as well.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |2 +-
 include/linux/ceph/osd_client.h |3 +--
 net/ceph/osd_client.c   |   10 --
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 4e44085..2d10504 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1170,7 +1170,7 @@ static int rbd_do_request(struct request *rq,
osd_req-r_oid_len = strlen(osd_req-r_oid);

rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
-   ret = ceph_calc_raw_layout(osdc, osd_req-r_file_layout,
+   ret = ceph_calc_raw_layout(osd_req-r_file_layout,
ofs, len, bno, osd_req, ops);
rbd_assert(ret == 0);

diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 0e82a0a..fe3a6e8 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -207,8 +207,7 @@ extern void ceph_osdc_handle_reply(struct
ceph_osd_client *osdc,
 extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc,
 struct ceph_msg *msg);

-extern int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
-   struct ceph_file_layout *layout,
+extern int ceph_calc_raw_layout(struct ceph_file_layout *layout,
u64 off, u64 *plen, u64 *bno,
struct ceph_osd_request *req,
struct ceph_osd_req_op *op);
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index f844a35..baaec06 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -38,8 +38,7 @@ static int op_has_extent(int op)
op == CEPH_OSD_OP_WRITE);
 }

-int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
-   struct ceph_file_layout *layout,
+int ceph_calc_raw_layout(struct ceph_file_layout *layout,
u64 off, u64 *plen, u64 *bno,
struct ceph_osd_request *req,
struct ceph_osd_req_op *op)
@@ -99,8 +98,7 @@ EXPORT_SYMBOL(ceph_calc_raw_layout);
  *
  * fill osd op in request message.
  */
-static int calc_layout(struct ceph_osd_client *osdc,
-  struct ceph_vino vino,
+static int calc_layout(struct ceph_vino vino,
   struct ceph_file_layout *layout,
   u64 off, u64 *plen,
   struct ceph_osd_request *req,
@@ -109,7 +107,7 @@ static int calc_layout(struct ceph_osd_client *osdc,
u64 bno;
int r;

-   r = ceph_calc_raw_layout(osdc, layout, off, plen, bno, req, op);
+   r = ceph_calc_raw_layout(layout, off, plen, bno, req, op);
if (r  0)
return r;

@@ -444,7 +442,7 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
return ERR_PTR(-ENOMEM);

/* calculate max write size */
-   r = calc_layout(osdc, vino, layout, off, plen, req, ops);
+   r = calc_layout(vino, layout, off, plen, req, ops);
if (r  0)
return ERR_PTR(r);
req-r_file_layout = *layout;  /* keep a copy */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] libceph: simplify ceph_osdc_alloc_request()

2012-11-14 Thread Alex Elder
These two patches just move a couple of things that
ceph_osdc_alloc_request() does out and into the caller.
It simplifies the function slightly, and makes it possible
for some callers to not have to supply irrelevant arguments.

-Alex

[PATCH 1/2] libceph: don't set flags in ceph_osdc_alloc_request()
[PATCH 2/2] libceph: don't set pages or bio in ceph_osdc_alloc_request()
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] libceph: don't set flags in ceph_osdc_alloc_request()

2012-11-14 Thread Alex Elder
The only thing ceph_osdc_alloc_request() really does with the
flags value it is passed is assign it to the newly-created
osd request structure.  Do that in the caller instead.

Both callers subsequently call ceph_osdc_build_request(), so have
that function (instead of ceph_osdc_alloc_request()) issue a warning
if a request comes through with neither the read nor write flags set.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |3 ++-
 include/linux/ceph/osd_client.h |1 -
 net/ceph/osd_client.c   |   11 ---
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 2d10504..b6b1522 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1150,13 +1150,14 @@ static int rbd_do_request(struct request *rq,
(unsigned long long) len, coll, coll_index);

osdc = rbd_dev-rbd_client-client-osdc;
-   osd_req = ceph_osdc_alloc_request(osdc, flags, snapc, ops,
+   osd_req = ceph_osdc_alloc_request(osdc, snapc, ops,
false, GFP_NOIO, pages, bio);
if (!osd_req) {
ret = -ENOMEM;
goto done_pages;
}

+   osd_req-r_flags = flags;
osd_req-r_callback = rbd_cb;

rbd_req-rq = rq;
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index fe3a6e8..6ddda5b 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -213,7 +213,6 @@ extern int ceph_calc_raw_layout(struct
ceph_file_layout *layout,
struct ceph_osd_req_op *op);

 extern struct ceph_osd_request *ceph_osdc_alloc_request(struct
ceph_osd_client *osdc,
-  int flags,
   struct ceph_snap_context *snapc,
   struct ceph_osd_req_op *ops,
   bool use_mempool,
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index baaec06..3e82e61 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -163,7 +163,6 @@ static int get_num_ops(struct ceph_osd_req_op *ops)
 }

 struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client
*osdc,
-  int flags,
   struct ceph_snap_context *snapc,
   struct ceph_osd_req_op *ops,
   bool use_mempool,
@@ -200,10 +199,6 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
INIT_LIST_HEAD(req-r_req_lru_item);
INIT_LIST_HEAD(req-r_osd_item);

-   req-r_flags = flags;
-
-   WARN_ON((flags  (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0);
-
/* create reply message */
if (use_mempool)
msg = ceph_msgpool_get(osdc-msgpool_op_reply, 0);
@@ -339,6 +334,8 @@ void ceph_osdc_build_request(struct ceph_osd_request
*req,
u64 data_len = 0;
int i;

+   WARN_ON((flags  (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0);
+
head = msg-front.iov_base;
head-snapid = cpu_to_le64(snap_id);
op = (void *)(head + 1);
@@ -434,12 +431,12 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
} else
ops[1].op = 0;

-   req = ceph_osdc_alloc_request(osdc, flags,
-snapc, ops,
+   req = ceph_osdc_alloc_request(osdc, snapc, ops,
 use_mempool,
 GFP_NOFS, NULL, NULL);
if (!req)
return ERR_PTR(-ENOMEM);
+   req-r_flags = flags;

/* calculate max write size */
r = calc_layout(vino, layout, off, plen, req, ops);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] libceph: don't set pages or bio in ceph_osdc_alloc_request()

2012-11-14 Thread Alex Elder
Only one of the two callers of ceph_osdc_alloc_request() provides
page or bio data for its payload.  And essentially all that function
was doing with those arguments was assigning them to fields in the
osd request structure.

Simplify ceph_osdc_alloc_request() by having the caller take care of
making those assignments

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |8 ++--
 include/linux/ceph/osd_client.h |4 +---
 net/ceph/osd_client.c   |   15 ++-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b6b1522..bdb099c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1150,14 +1150,18 @@ static int rbd_do_request(struct request *rq,
(unsigned long long) len, coll, coll_index);

osdc = rbd_dev-rbd_client-client-osdc;
-   osd_req = ceph_osdc_alloc_request(osdc, snapc, ops,
-   false, GFP_NOIO, pages, bio);
+   osd_req = ceph_osdc_alloc_request(osdc, snapc, ops, false, GFP_NOIO);
if (!osd_req) {
ret = -ENOMEM;
goto done_pages;
}

osd_req-r_flags = flags;
+   osd_req-r_pages = pages;
+   if (bio) {
+   osd_req-r_bio = bio;
+   bio_get(osd_req-r_bio);
+   }
osd_req-r_callback = rbd_cb;

rbd_req-rq = rq;
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 6ddda5b..75f56d3 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -216,9 +216,7 @@ extern struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *
   struct ceph_snap_context *snapc,
   struct ceph_osd_req_op *ops,
   bool use_mempool,
-  gfp_t gfp_flags,
-  struct page **pages,
-  struct bio *bio);
+  gfp_t gfp_flags);

 extern void ceph_osdc_build_request(struct ceph_osd_request *req,
u64 off, u64 len,
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 3e82e61..5ed9c92 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -166,9 +166,7 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
   struct ceph_snap_context *snapc,
   struct ceph_osd_req_op *ops,
   bool use_mempool,
-  gfp_t gfp_flags,
-  struct page **pages,
-  struct bio *bio)
+  gfp_t gfp_flags)
 {
struct ceph_osd_request *req;
struct ceph_msg *msg;
@@ -229,13 +227,6 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
memset(msg-front.iov_base, 0, msg-front.iov_len);

req-r_request = msg;
-   req-r_pages = pages;
-#ifdef CONFIG_BLOCK
-   if (bio) {
-   req-r_bio = bio;
-   bio_get(req-r_bio);
-   }
-#endif

return req;
 }
@@ -431,9 +422,7 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
} else
ops[1].op = 0;

-   req = ceph_osdc_alloc_request(osdc, snapc, ops,
-use_mempool,
-GFP_NOFS, NULL, NULL);
+   req = ceph_osdc_alloc_request(osdc, snapc, ops, use_mempool, GFP_NOFS);
if (!req)
return ERR_PTR(-ENOMEM);
req-r_flags = flags;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] rbd: disavow any support for multiple osd ops

2012-11-14 Thread Alex Elder
The rbd code is rife with places where it seems that an
osd request could support multiple osd ops.  But the
reality is that there are spots in rbd as well as libceph
and the messenger that make such support impossible without
some (upcoming, planned) additional work.

This series starts by getting rid of the notion that
anything but a single op will be passed for an osd
operation.  The first two patches just make it clear
that we never actually do send more than one op from
rbd anyway, the last two make the code reflect that,
simplifying things in the process.

-Alex

[PATCH 1/4] rbd: pass num_op with ops array
[PATCH 2/4] libceph: pass num_op with ops
[PATCH 3/4] rbd: there is really only one op
[PATCH 4/4] rbd: assume single op in a request
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rbd: kill ceph_osd_req_op-flags

2012-11-14 Thread Alex Elder
The flags field of struct ceph_osd_req_op is never used, so just get
rid of it.

Signed-off-by: Alex Elder el...@inktank.com
---
 include/linux/ceph/osd_client.h |1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 2b04d05..69287cc 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -157,7 +157,6 @@ struct ceph_osd_client {

 struct ceph_osd_req_op {
u16 op;   /* CEPH_OSD_OP_* */
-   u32 flags;/* CEPH_OSD_FLAG_* */
union {
struct {
u64 offset, length;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] rbd: stop using ceph_calc_raw_layout()

2012-11-14 Thread Alex Elder
This series makes rbd no longer call ceph_calc_raw_layout(),
and in doing so, also stop calling ceph_calc_file_object_mapping()
for its requests.  Apparently the call to the former was for
the *other* side-effects it had (unrelated to the layout).

-Alex

[PATCH 1/4] rbd: pull in ceph_calc_raw_layout()
[PATCH 2/4] rbd: open code rbd_calc_raw_layout()
[PATCH 3/4] rbd: don't bother calculating file mapping
[PATCH 4/4] rbd: use a common layout for each device
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] rbd: pull in ceph_calc_raw_layout()

2012-11-14 Thread Alex Elder
This is the first in a series of patches aimed at eliminating
the use of ceph_calc_raw_layout() by rbd.

It simply pulls in a copy of that function and renames it
rbd_calc_raw_layout().

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |   36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e1094ff..810b58d 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1103,6 +1103,40 @@ static void rbd_layout_init(struct
ceph_file_layout *layout, u64 pool_id)
layout-fl_pg_pool = cpu_to_le32((u32) pool_id);
 }

+int rbd_calc_raw_layout(struct ceph_file_layout *layout,
+   u64 off, u64 *plen, u64 *bno,
+   struct ceph_osd_request *req,
+   struct ceph_osd_req_op *op)
+{
+   u64 orig_len = *plen;
+   u64 objoff, objlen;/* extent in object */
+   int r;
+
+   /* object extent? */
+   r = ceph_calc_file_object_mapping(layout, off, orig_len, bno,
+ objoff, objlen);
+   if (r  0)
+   return r;
+   if (objlen  orig_len) {
+   *plen = objlen;
+   dout( skipping last %llu, final file extent %llu~%llu\n,
+orig_len - *plen, off, *plen);
+   }
+
+   if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
+   op-extent.offset = objoff;
+   op-extent.length = objlen;
+   }
+   req-r_num_pages = calc_pages_for(off, *plen);
+   req-r_page_alignment = off  ~PAGE_MASK;
+   if (op-op == CEPH_OSD_OP_WRITE)
+   op-payload_len = *plen;
+
+   dout(calc_layout bno=%llx %llu~%llu (%d pages)\n,
+*bno, objoff, objlen, req-r_num_pages);
+   return 0;
+}
+
 /*
  * Send ceph osd request
  */
@@ -1169,7 +1203,7 @@ static int rbd_do_request(struct request *rq,
osd_req-r_oid_len = strlen(osd_req-r_oid);

rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
-   ret = ceph_calc_raw_layout(osd_req-r_file_layout,
+   ret = rbd_calc_raw_layout(osd_req-r_file_layout,
ofs, len, bno, osd_req, op);
rbd_assert(ret == 0);

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] rbd: open code rbd_calc_raw_layout()

2012-11-14 Thread Alex Elder
This patch gets rid of rbd_calc_raw_layout() by simply open coding
it in its one caller.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |   55
+--
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 810b58d..1afe51f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1034,7 +1034,7 @@ static struct ceph_osd_req_op
*rbd_create_rw_op(int opcode, u32 payload_len)
return NULL;
/*
 * op extent offset and length will be set later on
-* in calc_raw_layout()
+* after ceph_calc_file_object_mapping().
 */
op-op = opcode;
op-payload_len = payload_len;
@@ -1103,40 +1103,6 @@ static void rbd_layout_init(struct
ceph_file_layout *layout, u64 pool_id)
layout-fl_pg_pool = cpu_to_le32((u32) pool_id);
 }

-int rbd_calc_raw_layout(struct ceph_file_layout *layout,
-   u64 off, u64 *plen, u64 *bno,
-   struct ceph_osd_request *req,
-   struct ceph_osd_req_op *op)
-{
-   u64 orig_len = *plen;
-   u64 objoff, objlen;/* extent in object */
-   int r;
-
-   /* object extent? */
-   r = ceph_calc_file_object_mapping(layout, off, orig_len, bno,
- objoff, objlen);
-   if (r  0)
-   return r;
-   if (objlen  orig_len) {
-   *plen = objlen;
-   dout( skipping last %llu, final file extent %llu~%llu\n,
-orig_len - *plen, off, *plen);
-   }
-
-   if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
-   op-extent.offset = objoff;
-   op-extent.length = objlen;
-   }
-   req-r_num_pages = calc_pages_for(off, *plen);
-   req-r_page_alignment = off  ~PAGE_MASK;
-   if (op-op == CEPH_OSD_OP_WRITE)
-   op-payload_len = *plen;
-
-   dout(calc_layout bno=%llx %llu~%llu (%d pages)\n,
-*bno, objoff, objlen, req-r_num_pages);
-   return 0;
-}
-
 /*
  * Send ceph osd request
  */
@@ -1160,6 +1126,8 @@ static int rbd_do_request(struct request *rq,
struct ceph_osd_request *osd_req;
int ret;
u64 bno;
+   u64 obj_off = 0;
+   u64 obj_len = 0;
struct timespec mtime = CURRENT_TIME;
struct rbd_request *rbd_req;
struct ceph_osd_client *osdc;
@@ -1203,9 +1171,22 @@ static int rbd_do_request(struct request *rq,
osd_req-r_oid_len = strlen(osd_req-r_oid);

rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
-   ret = rbd_calc_raw_layout(osd_req-r_file_layout,
-   ofs, len, bno, osd_req, op);
+   ret = ceph_calc_file_object_mapping(osd_req-r_file_layout, ofs, len,
+   bno, obj_off, obj_len);
rbd_assert(ret == 0);
+   if (obj_len  len) {
+   dout( skipping last %llu, final file extent %llu~%llu\n,
+len - obj_len, ofs, obj_len);
+   len = obj_len;
+   }
+   if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
+   op-extent.offset = obj_off;
+   op-extent.length = obj_len;
+   if (op-op == CEPH_OSD_OP_WRITE)
+   op-payload_len = obj_len;
+   }
+   osd_req-r_num_pages = calc_pages_for(ofs, len);
+   osd_req-r_page_alignment = ofs  ~PAGE_MASK;

ceph_osdc_build_request(osd_req, ofs, len, 1, op,
snapc, snapid, mtime);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] rbd: don't bother calculating file mapping

2012-11-14 Thread Alex Elder
When rbd_do_request() has a request to process it initializes a ceph
file layout structure and uses it to compute offsets and limits for
the range of the request using ceph_calc_file_object_mapping().

The layout used is fixed, and is based on RBD_MAX_OBJ_ORDER (30).
It sets the layout's object size and stripe unit to be 1 GB (2^30),
and sets the stripe count to be 1.

The job of ceph_calc_file_object_mapping() is to determine which
of a sequence of objects will contain data covered by range, and
within that object, at what offset the range starts.  It also
truncates the length of the range at the end of the selected object
if necessary.

This is needed for ceph fs, but for rbd it really serves no purpose.
It does its own blocking of images into objects, echo of which is
(1  obj_order) in size, and as a result it ignores the bno
value returned by ceph_calc_file_object_mapping().  In addition,
by the point a request has reached this function, it is already
destined for a single rbd object, and its length will not exceed
that object's extent.  Because of this, and because the mapping will
result in blocking up the range using an integer multiple of the
image's object order, ceph_calc_file_object_mapping() will never
change the offset or length values defined by the request.

In other words, this call is a big no-op for rbd data requests.

There is one exception.  We read the header object using this
function, and in that case we will not have already limited the
request size.  However, the header is a single object (not a file or
rbd image), and should not be broken into pieces anyway.  So in fact
we should *not* be calling ceph_calc_file_object_mapping() when
operating on the header object.

So...

Don't call ceph_calc_file_object_mapping() in rbd_do_request(),
because useless for image data and incorrect to do sofor the image
header.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |   18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1afe51f..30a73ae 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1125,9 +1125,6 @@ static int rbd_do_request(struct request *rq,
 {
struct ceph_osd_request *osd_req;
int ret;
-   u64 bno;
-   u64 obj_off = 0;
-   u64 obj_len = 0;
struct timespec mtime = CURRENT_TIME;
struct rbd_request *rbd_req;
struct ceph_osd_client *osdc;
@@ -1171,19 +1168,12 @@ static int rbd_do_request(struct request *rq,
osd_req-r_oid_len = strlen(osd_req-r_oid);

rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
-   ret = ceph_calc_file_object_mapping(osd_req-r_file_layout, ofs, len,
-   bno, obj_off, obj_len);
-   rbd_assert(ret == 0);
-   if (obj_len  len) {
-   dout( skipping last %llu, final file extent %llu~%llu\n,
-len - obj_len, ofs, obj_len);
-   len = obj_len;
-   }
+
if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
-   op-extent.offset = obj_off;
-   op-extent.length = obj_len;
+   op-extent.offset = ofs;
+   op-extent.length = len;
if (op-op == CEPH_OSD_OP_WRITE)
-   op-payload_len = obj_len;
+   op-payload_len = len;
}
osd_req-r_num_pages = calc_pages_for(ofs, len);
osd_req-r_page_alignment = ofs  ~PAGE_MASK;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] rbd: use a common layout for each device

2012-11-14 Thread Alex Elder
Each osd message includes a layout structure, and for rbd it is
always the same (at least for osd's in a given pool).

Initialize a layout structure when an rbd_dev gets created and just
copy that into osd requests for the rbd image.

Replace an assertion that was done when initializing the layout
structures with code that catches and handles anything that would
trigger the assertion as soon as it is identified.  This precludes
that (bad) condition from ever occurring.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |   34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 30a73ae..fba0822 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -235,6 +235,8 @@ struct rbd_device {

char*header_name;

+   struct ceph_file_layout layout;
+
struct ceph_osd_event   *watch_event;
struct ceph_osd_request *watch_request;

@@ -1093,16 +1095,6 @@ static void rbd_coll_end_req(struct rbd_request
*rbd_req,
ret, len);
 }

-static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id)
-{
-   memset(layout, 0, sizeof (*layout));
-   layout-fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   layout-fl_stripe_count = cpu_to_le32(1);
-   layout-fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   rbd_assert(pool_id = (u64) U32_MAX);
-   layout-fl_pg_pool = cpu_to_le32((u32) pool_id);
-}
-
 /*
  * Send ceph osd request
  */
@@ -1167,7 +1159,7 @@ static int rbd_do_request(struct request *rq,
strncpy(osd_req-r_oid, object_name, sizeof(osd_req-r_oid));
osd_req-r_oid_len = strlen(osd_req-r_oid);

-   rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
+   osd_req-r_file_layout = rbd_dev-layout;   /* struct */

if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
op-extent.offset = ofs;
@@ -2266,6 +2258,13 @@ struct rbd_device *rbd_dev_create(struct
rbd_client *rbdc,
rbd_dev-spec = spec;
rbd_dev-rbd_client = rbdc;

+   /* Initialize the layout used for all rbd requests */
+
+   rbd_dev-layout.fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   rbd_dev-layout.fl_stripe_count = cpu_to_le32(1);
+   rbd_dev-layout.fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   rbd_dev-layout.fl_pg_pool = cpu_to_le32((u32) spec-pool_id);
+
return rbd_dev;
 }

@@ -2520,6 +2519,12 @@ static int rbd_dev_v2_parent_info(struct
rbd_device *rbd_dev)
if (parent_spec-pool_id == CEPH_NOPOOL)
goto out;   /* No parent?  No problem. */

+   /* The ceph file layout needs to fit pool id in 32 bits */
+
+   ret = -EIO;
+   if (WARN_ON(parent_spec-pool_id  (u64) U32_MAX))
+   goto out;
+
image_id = ceph_extract_encoded_string(p, end, NULL, GFP_KERNEL);
if (IS_ERR(image_id)) {
ret = PTR_ERR(image_id);
@@ -3648,6 +3653,13 @@ static ssize_t rbd_add(struct bus_type *bus,
if (spec-pool_id == CEPH_NOPOOL)
goto err_out_client;

+   /* The ceph file layout needs to fit pool id in 32 bits */
+
+   if (WARN_ON(spec-pool_id  (u64) U32_MAX)) {
+   rc = -EIO;
+   goto err_out_client;
+   }
+
rbd_dev = rbd_dev_create(rbdc, spec);
if (!rbd_dev)
goto err_out_client;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Small feature request for v0.55 release

2012-11-14 Thread Nick Bartos
I see that v0.55 will be the next stable release.  Would it be
possible to use standard tarball naming conventions for this release?

If I download http://ceph.com/download/ceph-0.48.2.tar.bz2, the top
level directory is actually ceph-0.48.2argonaut, not ceph-0.48.2 as
expected.  Downloading
http://ceph.com/download/ceph-0.48.2argonaut.tar.bz2 yields a slightly
more expected result, but still isn't the typical *ix style of
name-version.tar.  This is very annoying in some build systems, which
have that assumption.  I've actually been extracting the tarballs,
renaming the top level directory, then recompressing them.

It would be great if we didn't have to do that with the next release,
e.g. extracting http://ceph.com/download/ceph-0.55.tar.bz2 would yield
a top level directory of ceph-0.55.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small feature request for v0.55 release

2012-11-14 Thread Jimmy Tang

On 14 Nov 2012, at 16:14, Sage Weil wrote:

 
 Appending the codename to the version string is something we did with 
 argonaut (0.48argonaut) just to make it obvious to users which stable 
 version they are on.
 
 How do people feel about that?  Is it worthwhile?  Useless?  Ugly?
 
 We can certainly skip it for 0.55 bobtail…

Just throwing in some thoughts, but how about a scheme like 
${name}-stable-${version}.tar.bz2 and have the corresponding directory 
structure inside and just ditch code names in the tar ball filename? It doesn't 
look as nice with out a codename, but it makes it absolutely clear to new users 
that it is a stable release.

Jimmy

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small feature request for v0.55 release

2012-11-14 Thread Tren Blackburn
On Wed, Nov 14, 2012 at 1:53 PM, Nick Bartos n...@pistoncloud.com wrote:
 I see that v0.55 will be the next stable release.  Would it be
 possible to use standard tarball naming conventions for this release?

 If I download http://ceph.com/download/ceph-0.48.2.tar.bz2, the top
 level directory is actually ceph-0.48.2argonaut, not ceph-0.48.2 as
 expected.  Downloading
 http://ceph.com/download/ceph-0.48.2argonaut.tar.bz2 yields a slightly
 more expected result, but still isn't the typical *ix style of
 name-version.tar.  This is very annoying in some build systems, which
 have that assumption.  I've actually been extracting the tarballs,
 renaming the top level directory, then recompressing them.

 It would be great if we didn't have to do that with the next release,
 e.g. extracting http://ceph.com/download/ceph-0.55.tar.bz2 would yield
 a top level directory of ceph-0.55.

+1

I'm glad to see I'm not the only one doing this. Gentoo's ebuild
system doesn't respond kindly to this either.

t.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small feature request for v0.55 release

2012-11-14 Thread Tren Blackburn
On Wed, Nov 14, 2012 at 3:40 PM, Jimmy Tang jt...@tchpc.tcd.ie wrote:

 On 14 Nov 2012, at 16:14, Sage Weil wrote:


 Appending the codename to the version string is something we did with
 argonaut (0.48argonaut) just to make it obvious to users which stable
 version they are on.

 How do people feel about that?  Is it worthwhile?  Useless?  Ugly?

 We can certainly skip it for 0.55 bobtail…

 Just throwing in some thoughts, but how about a scheme like 
 ${name}-stable-${version}.tar.bz2 and have the corresponding directory 
 structure inside and just ditch code names in the tar ball filename? It 
 doesn't look as nice with out a codename, but it makes it absolutely clear to 
 new users that it is a stable release.

Personally, I'd prefer standard naming of ${name}-${version}.tar.bz2.
You make it clear on your site which version is the LTS release, and
which are the developer releases.

t.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible

2012-11-14 Thread Sage Weil
Hi Danny,

Have you had a chance to work on this?  I'd like to include this 
in bobtail.  If you don't have time we can go ahead an implement it, but 
I'd like avoid duplicating effort.

Thanks!
sage


On Fri, 2 Nov 2012, Danny Al-Gaaf wrote:
 Hi Sage,
 
 sorry for the late reply, was absent some weeks and busy with other issues.
 
 Am 17.08.2012 01:40, schrieb Sage Weil:
  On Thu, 16 Aug 2012, Tommi Virtanen wrote:
  On Thu, Aug 16, 2012 at 3:32 PM, Sage Weil s...@inktank.com wrote:
  As for the new options, I suggest:
 
   * osd fs type
   * osd fs devs   (will work for mkcephfs, not for new stuff)
   * osd fs path
   * osd fs options
 
  What does osd_fs_path mean, and how is it different from the osd_data dir?
  
  The idea was that you might wand the fs mounted somewhere other that 
  osd_data.  I'm not sure it's useful; we may as well drop that...
  
  I'm expecting to need both mkfs-time options (btrfs metadata block
  size etc) and mount-time options (noatime etc).
 
  It would be nice if there was a way to set the options for all
  fstypes, and then just toggle which one is used (by default). That
  avoids bugs like trying to mkfs/mount btrfs with xfs-specific options,
  and vice versa.
 
  I'm not sure how well our config system will handle dynamic variable
  names -- ceph-authtool was fine with me just putting data in
  osd_crush_location, and we don't need to access these variables from
  C++, so it should be fine. If you really wanted to, you could probably
  cram the them into a single variable, with ad hoc structured data in
  the string value, but that's ugly.. Or just hardcode the list of
  possible filesystems, and then it's not dynamic variable names
  anymore.
  
  Yeah, ceph-conf will happily take anything.  The C++ code has to do 
  slightly more work to get arbitrary config fields, but that's not an 
  issue.
  
  So I'm dreaming of something like:
 
  [osd]
  # what mount options will be passed when an osd data disk is using
  # one of these filesystems; these are passed to mount -o
  osd mount options btrfs = herp,foo=bar
  osd mount options xfs = noatime,derp
 
  # what mkfs options are used when creating new osd data disk
  # filesystems
  osd mkfs options btrfs = --hur
  osd mkfs options xfs = --dur
 
  # what fstype to use by default when mkfs'ing; mounting will detect
  # what's there (with blkid) and work with anything
  osd mkfs type = btrfs
 
 I will prepare a patch with these for the current mkcephfs and init-ceph
 incl. aliases for the old keys and cmdline options where possible.
 
  # this may go away with mkcephfs 2.0, and it will have to get more
  # complex if we provide something for journals too, etc, because you
  # may want to pair specific data disks to specific journals (DH has
  # this need).. haven't had time to think it through, which is why i'm
  # leaning toward and here's a hook where you run something on the
  # host that calls ceph-disk-prepare etc on all the disks you want,
  # and using uuids to match journals to data disks -- this work has
  # not yet started)
  osd fs devs = /dev/sdb /dev/sdc
  
  This all looks good to me.  What do you think, Danny?
 
 This part (osd fs devs) is for a new mkcepfs.2.0 if I understand you
 correctly. Sounds okay for me atm. (Tommi: Are there any new information
 on this? Did you already start to work on this?)
 
 Danny
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: changed rbd cp behavior in 0.53

2012-11-14 Thread Dan Mick



On 11/12/2012 02:47 PM, Josh Durgin wrote:

On 11/12/2012 08:30 AM, Andrey Korolyov wrote:

Hi,

For this version, rbd cp assumes that destination pool is the same as
source, not 'rbd', if pool in the destination path is omitted.

rbd cp install/img testimg
rbd ls install
img testimg


Is this change permanent?

Thanks!


This is a regression. The previous behavior will be restored for 0.54.
I added http://tracker.newdream.net/issues/3478 to track it.


Actually, on detailed examination, it looks like this has been the 
behavior for a long time; I think the wiser course would be not to 
change this defaulting.  One could argue the value of such defaulting, 
but it's also true that you can specify the source and destination pools 
explicitly.


Andrey, any strong objection to leaving this the way it is?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Authorization issues in the 0.54

2012-11-14 Thread Yehuda Sadeh
On Wed, Nov 14, 2012 at 4:20 AM, Andrey Korolyov and...@xdel.ru wrote:
 Hi,
 In the 0.54 cephx is probably broken somehow:

 $ ceph auth add client.qemukvm osd 'allow *' mon 'allow *' mds 'allow
 *' -i qemukvm.key
 2012-11-14 15:51:23.153910 7ff06441f780 -1 read 65 bytes from qemukvm.key
 added key for client.qemukvm

 $ ceph auth list
 ...
 client.admin
 key: [xx]
 caps: [mds] allow *

Note that for mds you just specify 'allow' and not 'allow *'. It
shouldn't affect the stuff that you're testing though.

 caps: [mon] allow *
 caps: [osd] allow *
 client.qemukvm
 key: [yy]
 caps: [mds] allow *
 caps: [mon] allow *
 caps: [osd] allow *
 ...
 $ virsh secret-set-value --secret uuid --base64 yy
 set username in the VM` xml...
 $ virsh start testvm
 kvm: -drive 
 file=rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789,if=none,id=drive-virtio-disk0,format=raw:
 could not open disk image
 rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789:
 Operation not permitted
 $ virsh secret-set-value --secret uuid --base64 xx
 set username again to admin for the VM` disk
 $ virsh start testvm
 Finally, vm started successfully.

 All rbd commands issued from cli works okay with the appropriate
 credentials, qemu binary was linked with same librbd as running one.
 Does anyone have a suggestion?

There wasn't any change that I'm aware of that should make that
happening. Can you reproduce it with 'debug ms = 1' and 'debug auth =
20'?

Thanks,
Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem with ceph and btrfs patch: set journal_info in async trans commit worker

2012-11-14 Thread Miao Xie
Hi, Stefan

On wed, 14 Nov 2012 14:42:07 +0100, Stefan Priebe - Profihost AG wrote:
 Hello list,
 
 i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was seeing a 
 massive performance degration. I see around 22x btrfs-endio-write processes 
 every 10-20 seconds and they run a long time while consuming a massive amount 
 of CPU.
 
 So my performance of 23.000 iops drops to an up and down of 23.000 iops to 0 
 - avg is now 2500 iops instead of 23.000.
 
 Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe Btrfs: 
 set journal_info in async trans commit worker as the problematic patch.
 
 When i revert this one everything is fine again.
 
 Is this known?

Could you try the following patch?

http://marc.info/?l=linux-btrfsm=135175512030453w=2

I think the patch

  Btrfs: set journal_info in async trans commit worker

is not the real reason that caused the regression.

I guess it is caused by the bug of the reservation. When we join the
same transaction handle more than 2 times, the pointer of the reservation
in the transaction handle would be lost, and the statistical data in the
reservation would be corrupted. And then we would trigger the space flush,
which may block your tasks.

Thanks
Miao

 
 Greets,
 Stefan
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small feature request for v0.55 release

2012-11-14 Thread Nick Bartos
My personal preference would be for ${name}-${version}.tar.bz2 as
well, but 2nd place would be ${name}-stable-${version}.tar.bz2.


On Wed, Nov 14, 2012 at 3:47 PM, Tren Blackburn t...@eotnetworks.com wrote:
 On Wed, Nov 14, 2012 at 3:40 PM, Jimmy Tang jt...@tchpc.tcd.ie wrote:

 On 14 Nov 2012, at 16:14, Sage Weil wrote:


 Appending the codename to the version string is something we did with
 argonaut (0.48argonaut) just to make it obvious to users which stable
 version they are on.

 How do people feel about that?  Is it worthwhile?  Useless?  Ugly?

 We can certainly skip it for 0.55 bobtail…

 Just throwing in some thoughts, but how about a scheme like 
 ${name}-stable-${version}.tar.bz2 and have the corresponding directory 
 structure inside and just ditch code names in the tar ball filename? It 
 doesn't look as nice with out a codename, but it makes it absolutely clear 
 to new users that it is a stable release.

 Personally, I'd prefer standard naming of ${name}-${version}.tar.bz2.
 You make it clear on your site which version is the LTS release, and
 which are the developer releases.

 t.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: changed rbd cp behavior in 0.53

2012-11-14 Thread Andrey Korolyov
On Thu, Nov 15, 2012 at 4:56 AM, Dan Mick dan.m...@inktank.com wrote:


 On 11/12/2012 02:47 PM, Josh Durgin wrote:

 On 11/12/2012 08:30 AM, Andrey Korolyov wrote:

 Hi,

 For this version, rbd cp assumes that destination pool is the same as
 source, not 'rbd', if pool in the destination path is omitted.

 rbd cp install/img testimg
 rbd ls install
 img testimg


 Is this change permanent?

 Thanks!


 This is a regression. The previous behavior will be restored for 0.54.
 I added http://tracker.newdream.net/issues/3478 to track it.


 Actually, on detailed examination, it looks like this has been the behavior
 for a long time; I think the wiser course would be not to change this
 defaulting.  One could argue the value of such defaulting, but it's also
 true that you can specify the source and destination pools explicitly.

 Andrey, any strong objection to leaving this the way it is?

I`m not complaining -  this behavior seems more logical in the first
place and of course I use full path even doing something by hand.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


OSD crash on 0.48.2argonaut

2012-11-14 Thread Eric_YH_Chen
Dear All:

I met this issue on one of osd node. Is this a known issue? Thanks!

ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
 1: /usr/bin/ceph-osd() [0x6edaba]
 2: (()+0xfcb0) [0x7f08b112dcb0]
 3: (gsignal()+0x35) [0x7f08afd09445]
 4: (abort()+0x17b) [0x7f08afd0cbab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f08b065769d]
 6: (()+0xb5846) [0x7f08b0655846]
 7: (()+0xb5873) [0x7f08b0655873]
 8: (()+0xb596e) [0x7f08b065596e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1de) [0x7a82fe]
 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x693) [0x530f83]
 11: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, 
eversion_t)+0x159) [0x531ac9]
 12: (ReplicatedPG::sub_op_modify_reply(std::tr1::shared_ptrOpRequest)+0x15c) 
[0x53251c]
 13: (ReplicatedPG::do_sub_op_reply(std::tr1::shared_ptrOpRequest)+0x81) 
[0x54d241]
 14: (PG::do_request(std::tr1::shared_ptrOpRequest)+0x1e3) [0x600883]
 15: (OSD::dequeue_op(PG*)+0x238) [0x5bfaf8]
 16: (ThreadPool::worker()+0x4d5) [0x79f835]
 17: (ThreadPool::WorkThread::entry()+0xd) [0x5d87cd]
 18: (()+0x7e9a) [0x7f08b1125e9a]
 19: (clone()+0x6d) [0x7f08afdc54bd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html