endless flying slow requests
Hello list, i see this several times. Endless flying slow requests. And they never stop until i restart the mentioned osd. 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31789.858457 secs 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31790.858646 secs 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31791.858827 secs 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31792.858971 secs 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31793.859124 secs 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed When i now restart osd 24 they go away and everything is fine again. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Help] Use Ceph RBD as primary storage in CloudStack 4.0
Hi, Dan Thank you for your reply.After installing ceph, I can compile Qemu with RBD enable and have added the host to CloudStack successfully. 2012/11/14 Dan Mick dan.m...@inktank.com: Hi Alex: did you install the ceph packages before trying to build qemu? It sounds like qemu is looking for the Ceph libraries and not finding them. On 11/12/2012 09:38 PM, Alex Jiang wrote: Hi, All Has somebody used Ceph RBD in CloudStack as primary storage? I see that in the new features of CS 4.0, RBD is supported for KVM. So I tried using RBD as primary storage but met with some problems. I use a CentOS6.3 server as host. First I erase the qemu-kvm(0.12.1) and libvirt(0.9.10) because their versions are too low (Qemu on the Hypervisor has to be compiled with RBD enabled .The libvirt version on the Hypervisor has to be at least 0.10 with RBD enabled).Then I download the latest qemu(1.2.0) and libvirt(1.0.0) source code and compile and install them. But when compiling qemu source code, #wget http://wiki.qemu-project.org/download/qemu-1.2.0.tar.bz2 #tar jxvf qemu-1.2.0.tar.bz2 # cd qemu-1.2.0 # ./configure --enable-rbd the following errors occur: ERROR: User requested feature rados block device ERROR: configure was not able to find it But on Ubuntu12.04 I tried compiling qemu source code and succeed.Now I am very confused.How to use Ceph RBD as primary storage in CloudStack on CentOS6.3?Anyone can help me? Best Regards, Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph cluster hangs when rebooting one node
Hello! I have the same problem. After switching off the second node, the cluster hangs, there is some solution? All the best, Alex! 2012/11/12 Stefan Priebe - Profihost AG s.pri...@profihost.ag: Am 12.11.2012 16:11, schrieb Sage Weil: On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with No i don't see any min size: # ceph osd dump | grep ^pool pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 3000 pgp_num 3000 last_change 958 owner 0 ceph osd pool set poolname min_size 1 Yes this helps! But min_size is still not shown in ceph osd dump. Also when i reboot a node it takes up to 10s-20s until all osds from this node are set to failed and the I/O starts again. Should i issue an ceph osd out command before? But i had already this set for all my rules in my crushmap min_size 1 max_size 2 in my crushmap for each rule. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Authorization issues in the 0.54
Hi, In the 0.54 cephx is probably broken somehow: $ ceph auth add client.qemukvm osd 'allow *' mon 'allow *' mds 'allow *' -i qemukvm.key 2012-11-14 15:51:23.153910 7ff06441f780 -1 read 65 bytes from qemukvm.key added key for client.qemukvm $ ceph auth list ... client.admin key: [xx] caps: [mds] allow * caps: [mon] allow * caps: [osd] allow * client.qemukvm key: [yy] caps: [mds] allow * caps: [mon] allow * caps: [osd] allow * ... $ virsh secret-set-value --secret uuid --base64 yy set username in the VM` xml... $ virsh start testvm kvm: -drive file=rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789,if=none,id=drive-virtio-disk0,format=raw: could not open disk image rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789: Operation not permitted $ virsh secret-set-value --secret uuid --base64 xx set username again to admin for the VM` disk $ virsh start testvm Finally, vm started successfully. All rbd commands issued from cli works okay with the appropriate credentials, qemu binary was linked with same librbd as running one. Does anyone have a suggestion? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
problem with ceph and btrfs patch: set journal_info in async trans commit worker
Hello list, i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was seeing a massive performance degration. I see around 22x btrfs-endio-write processes every 10-20 seconds and they run a long time while consuming a massive amount of CPU. So my performance of 23.000 iops drops to an up and down of 23.000 iops to 0 - avg is now 2500 iops instead of 23.000. Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe Btrfs: set journal_info in async trans commit worker as the problematic patch. When i revert this one everything is fine again. Is this known? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: endless flying slow requests
Hi Stefan, I would be nice to confirm that no clients are waiting on replies for these requests; currently we suspect that the OSD request tracking is the buggy part. If you query the OSD admin socket you should be able to dump requests and see the client IP, and then query the client. Is it librbd? In that case you likely need to change the config so that it is listening on an admin socket ('admin socket = path'). Thanks! sage On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i see this several times. Endless flying slow requests. And they never stop until i restart the mentioned osd. 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31789.858457 secs 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31790.858646 secs 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31791.858827 secs 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31792.858971 secs 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below; oldest blocked for 31793.859124 secs 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds old, received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 rbd_data.75c55bf2fdd7.1399 [write 282624~4096] 3.3f6d2373) v4 currently delayed When i now restart osd 24 they go away and everything is fine again. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph cluster hangs when rebooting one node
On Wed, 14 Nov 2012, Aleksey Samarin wrote: Hello! I have the same problem. After switching off the second node, the cluster hangs, there is some solution? All the best, Alex! I suspect this is min_size; the latest master has a few changes and also will print it out so you can tell what is going on. min_size is the minimum number of replicas before the OSDs will go active (handle reads/writes). Setting it to 1 gets you old behavior, while increasing it protects you from cases where writes to a single replica that then fails will force the admin to make a difficult decision about losing data. You can adjust with ceph osd pool set pool name min_size value sage 2012/11/12 Stefan Priebe - Profihost AG s.pri...@profihost.ag: Am 12.11.2012 16:11, schrieb Sage Weil: On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with No i don't see any min size: # ceph osd dump | grep ^pool pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 3000 pgp_num 3000 last_change 958 owner 0 ceph osd pool set poolname min_size 1 Yes this helps! But min_size is still not shown in ceph osd dump. Also when i reboot a node it takes up to 10s-20s until all osds from this node are set to failed and the I/O starts again. Should i issue an ceph osd out command before? But i had already this set for all my rules in my crushmap min_size 1 max_size 2 in my crushmap for each rule. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] libceph: always init trail for osd requests
This series makes the ceph_osd_request-r_trail be a structure that's always initialized rather than a pointer. The result works equivalent to before but it makes things simpler. -Alex [PATCH 1/2] libceph: always allow trail in osd request [PATCH 2/2] libceph: kill op_needs_trail() -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] libceph: always allow trail in osd request
An osd request structure contains an optional trail portion, which if present will contain data to be passed in the payload portion of the message containing the request. The trail field is a ceph_pagelist pointer, and if null it indicates there is no trail. A ceph_pagelist structure contains a length field, and it can legitimately hold value 0. Make use of this to change the interpretation of the trail of an osd request so that every osd request has trailing data, it just might have length 0. This means we change the r_trail field in a ceph_osd_request structure from a pointer to a structure that is always initialized. Note that in ceph_osdc_start_request(), the trail pointer (or now address of that structure) is assigned to a ceph message's trail field. Here's why that's still OK (looking at net/ceph/messenger.c): - What would have resulted in a null pointer previously will now refer to a 0-length page list. That message trail pointer is used in two functions, write_partial_msg_pages() and out_msg_pos_next(). - In write_partial_msg_pages(), a null page list pointer is handled the same as a message with 0-length trail, and both result in a in_trail variable set to false. The trail pointer is only used if in_trail is true. - The only other place the message trail pointer is used is out_msg_pos_next(). That function is only called by write_partial_msg_pages() and only touches the trail pointer if the in_trail value it is passed is true. Therefore a null ceph_msg-trail pointer is equivalent to a non-null pointer referring to a 0-length page list structure. Signed-off-by: Alex Elder el...@inktank.com --- include/linux/ceph/osd_client.h |4 ++-- net/ceph/osd_client.c | 43 +++ 2 files changed, 14 insertions(+), 33 deletions(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index f2e5d2c..61562c7 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -10,6 +10,7 @@ #include linux/ceph/osdmap.h #include linux/ceph/messenger.h #include linux/ceph/auth.h +#include linux/ceph/pagelist.h /* * Maximum object name size @@ -22,7 +23,6 @@ struct ceph_snap_context; struct ceph_osd_request; struct ceph_osd_client; struct ceph_authorizer; -struct ceph_pagelist; /* * completion callback for async writepages @@ -95,7 +95,7 @@ struct ceph_osd_request { struct bio *r_bio; /* instead of pages */ #endif - struct ceph_pagelist *r_trail;/* trailing part of the data */ + struct ceph_pagelist r_trail; /* trailing part of the data */ }; struct ceph_osd_event { diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 540276e..15984d2 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -163,10 +163,7 @@ void ceph_osdc_release_request(struct kref *kref) bio_put(req-r_bio); #endif ceph_put_snap_context(req-r_snapc); - if (req-r_trail) { - ceph_pagelist_release(req-r_trail); - kfree(req-r_trail); - } + ceph_pagelist_release(req-r_trail); if (req-r_mempool) mempool_free(req, req-r_osdc-req_mempool); else @@ -200,8 +197,7 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, { struct ceph_osd_request *req; struct ceph_msg *msg; - int needs_trail; - int num_op = get_num_ops(ops, needs_trail); + int num_op = get_num_ops(ops, NULL); size_t msg_size = sizeof(struct ceph_osd_request_head); msg_size += num_op*sizeof(struct ceph_osd_op); @@ -244,15 +240,7 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, } req-r_reply = msg; - /* allocate space for the trailing data */ - if (needs_trail) { - req-r_trail = kmalloc(sizeof(struct ceph_pagelist), gfp_flags); - if (!req-r_trail) { - ceph_osdc_put_request(req); - return NULL; - } - ceph_pagelist_init(req-r_trail); - } + ceph_pagelist_init(req-r_trail); /* create request message; allow space for oid */ msg_size += MAX_OBJ_NAME_SIZE; @@ -304,29 +292,25 @@ static void osd_req_encode_op(struct ceph_osd_request *req, case CEPH_OSD_OP_GETXATTR: case CEPH_OSD_OP_SETXATTR: case CEPH_OSD_OP_CMPXATTR: - BUG_ON(!req-r_trail); - dst-xattr.name_len = cpu_to_le32(src-xattr.name_len); dst-xattr.value_len = cpu_to_le32(src-xattr.value_len); dst-xattr.cmp_op = src-xattr.cmp_op; dst-xattr.cmp_mode = src-xattr.cmp_mode; - ceph_pagelist_append(req-r_trail, src-xattr.name, + ceph_pagelist_append(req-r_trail, src-xattr.name,
[PATCH 2/2] libceph: kill op_needs_trail()
Since every osd message is now prepared to include trailing data, there's no need to check ahead of time whether any operations will make use of the trail portion of the message. We can drop the second argument ot get_num_ops(), and as a result we can also get rid of op_needs_trail() which is no longer used. Signed-off-by: Alex Elder el...@inktank.com --- net/ceph/osd_client.c | 27 --- 1 file changed, 4 insertions(+), 23 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 15984d2..20b7921 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -32,20 +32,6 @@ static void __unregister_linger_request(struct ceph_osd_client *osdc, static void __send_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req); -static int op_needs_trail(int op) -{ - switch (op) { - case CEPH_OSD_OP_GETXATTR: - case CEPH_OSD_OP_SETXATTR: - case CEPH_OSD_OP_CMPXATTR: - case CEPH_OSD_OP_CALL: - case CEPH_OSD_OP_NOTIFY: - return 1; - default: - return 0; - } -} - static int op_has_extent(int op) { return (op == CEPH_OSD_OP_READ || @@ -171,17 +157,12 @@ void ceph_osdc_release_request(struct kref *kref) } EXPORT_SYMBOL(ceph_osdc_release_request); -static int get_num_ops(struct ceph_osd_req_op *ops, int *needs_trail) +static int get_num_ops(struct ceph_osd_req_op *ops) { int i = 0; - if (needs_trail) - *needs_trail = 0; - while (ops[i].op) { - if (needs_trail op_needs_trail(ops[i].op)) - *needs_trail = 1; + while (ops[i].op) i++; - } return i; } @@ -197,7 +178,7 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, { struct ceph_osd_request *req; struct ceph_msg *msg; - int num_op = get_num_ops(ops, NULL); + int num_op = get_num_ops(ops); size_t msg_size = sizeof(struct ceph_osd_request_head); msg_size += num_op*sizeof(struct ceph_osd_op); @@ -357,7 +338,7 @@ void ceph_osdc_build_request(struct ceph_osd_request *req, struct ceph_osd_req_op *src_op; struct ceph_osd_op *op; void *p; - int num_op = get_num_ops(src_ops, NULL); + int num_op = get_num_ops(src_ops); size_t msg_size = sizeof(*head) + num_op*sizeof(*op); int flags = req-r_flags; u64 data_len = 0; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] libceph: tighten up some interfaces
While investigating exactly how and why rbd uses ceph_calc_raw_layout() I implemented some small changes to some functions to make it obvious to the caller that certain functions won't cause side-effects, or that certain functions do or don't need certain parameters. -Alex [PATCH 1/4] libceph: pass length to ceph_osdc_build_request() [PATCH 2/4] libceph: pass length to ceph_calc_file_object_mapping() [PATCH 3/4] libceph: drop snapid in ceph_calc_raw_layout() [PATCH 4/4] libceph: drop osdc from ceph_calc_raw_layout() -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] libceph: pass length to ceph_osdc_build_request()
The len argument to ceph_osdc_build_request() is set up to be passed by address, but that function never updates its value so there's no need to do this. Tighten up the interface by passing the length directly. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |2 +- include/linux/ceph/osd_client.h |2 +- net/ceph/osd_client.c |6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 9dc1d5f..08d1b6e 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1174,7 +1174,7 @@ static int rbd_do_request(struct request *rq, snapid, ofs, len, bno, osd_req, ops); rbd_assert(ret == 0); - ceph_osdc_build_request(osd_req, ofs, len, ops, snapc, mtime); + ceph_osdc_build_request(osd_req, ofs, len, ops, snapc, mtime); if (linger_req) { ceph_osdc_set_request_linger(osdc, osd_req); diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 61562c7..4bfb458 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -224,7 +224,7 @@ extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client * struct bio *bio); extern void ceph_osdc_build_request(struct ceph_osd_request *req, - u64 off, u64 *plen, + u64 off, u64 len, struct ceph_osd_req_op *src_ops, struct ceph_snap_context *snapc, struct timespec *mtime); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 20b7921..d550d9e 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -328,7 +328,7 @@ static void osd_req_encode_op(struct ceph_osd_request *req, * */ void ceph_osdc_build_request(struct ceph_osd_request *req, -u64 off, u64 *plen, +u64 off, u64 len, struct ceph_osd_req_op *src_ops, struct ceph_snap_context *snapc, struct timespec *mtime) @@ -382,7 +382,7 @@ void ceph_osdc_build_request(struct ceph_osd_request *req, if (flags CEPH_OSD_FLAG_WRITE) { req-r_request-hdr.data_off = cpu_to_le16(off); - req-r_request-hdr.data_len = cpu_to_le32(*plen + data_len); + req-r_request-hdr.data_len = cpu_to_le32(len + data_len); } else if (data_len) { req-r_request-hdr.data_off = 0; req-r_request-hdr.data_len = cpu_to_le32(data_len); @@ -456,7 +456,7 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, req-r_num_pages = calc_pages_for(page_align, *plen); req-r_page_alignment = page_align; - ceph_osdc_build_request(req, off, plen, ops, + ceph_osdc_build_request(req, off, *plen, ops, snapc, mtime); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] libceph: pass length to ceph_calc_file_object_mapping()
ceph_calc_file_object_mapping() takes (among other things) a file offset and length, and based on the layout, determines the object number (bno) backing the affected portion of the file's data and the offset into that object where the desired range begins. It also computes the size that should be used for the request--either the amount requested or something less if that would exceed the end of the object. This patch changes the input length parameter in this function so it is used only for input. That is, the argument will be passed by value rather than by address, so the value provided won't get updated by the function. The value would only get updated if the length would surpass the current object, and in that case the value it got updated to would be exactly that returned in *oxlen. Only one of the two callers is affected by this change. Update ceph_calc_raw_layout() so it records any updated value. Signed-off-by: Alex Elder el...@inktank.com --- fs/ceph/ioctl.c |2 +- include/linux/ceph/osdmap.h |2 +- net/ceph/osd_client.c |6 -- net/ceph/osdmap.c |9 - 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c index 36549a4..3b22150 100644 --- a/fs/ceph/ioctl.c +++ b/fs/ceph/ioctl.c @@ -194,7 +194,7 @@ static long ceph_ioctl_get_dataloc(struct file *file, void __user *arg) return -EFAULT; down_read(osdc-map_sem); - r = ceph_calc_file_object_mapping(ci-i_layout, dl.file_offset, len, + r = ceph_calc_file_object_mapping(ci-i_layout, dl.file_offset, len, dl.object_no, dl.object_offset, olen); if (r 0) diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h index c841396..9ea98d2 100644 --- a/include/linux/ceph/osdmap.h +++ b/include/linux/ceph/osdmap.h @@ -110,7 +110,7 @@ extern void ceph_osdmap_destroy(struct ceph_osdmap *map); /* calculate mapping of a file extent to an object */ extern int ceph_calc_file_object_mapping(struct ceph_file_layout *layout, -u64 off, u64 *plen, +u64 off, u64 len, u64 *bno, u64 *oxoff, u64 *oxlen); /* calculate mapping of object to a placement group */ diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index d550d9e..60c4e15 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -53,13 +53,15 @@ int ceph_calc_raw_layout(struct ceph_osd_client *osdc, reqhead-snapid = cpu_to_le64(snapid); /* object extent? */ - r = ceph_calc_file_object_mapping(layout, off, plen, bno, + r = ceph_calc_file_object_mapping(layout, off, orig_len, bno, objoff, objlen); if (r 0) return r; - if (*plen orig_len) + if (objlen orig_len) { + *plen = objlen; dout( skipping last %llu, final file extent %llu~%llu\n, orig_len - *plen, off, *plen); + } if (op_has_extent(op-op)) { op-extent.offset = objoff; diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index 27e904e..d7baf5d 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -1012,7 +1012,7 @@ bad: * pass a stride back to the caller. */ int ceph_calc_file_object_mapping(struct ceph_file_layout *layout, - u64 off, u64 *plen, + u64 off, u64 len, u64 *ono, u64 *oxoff, u64 *oxlen) { @@ -1023,7 +1023,7 @@ int ceph_calc_file_object_mapping(struct ceph_file_layout *layout, u32 su_per_object; u64 t, su_offset; - dout(mapping %llu~%llu osize %u fl_su %u\n, off, *plen, + dout(mapping %llu~%llu osize %u fl_su %u\n, off, len, osize, su); if (su == 0 || sc == 0) goto invalid; @@ -1056,11 +1056,10 @@ int ceph_calc_file_object_mapping(struct ceph_file_layout *layout, /* * Calculate the length of the extent being written to the selected -* object. This is the minimum of the full length requested (plen) or +* object. This is the minimum of the full length requested (len) or * the remainder of the current stripe being written to. */ - *oxlen = min_t(u64, *plen, su - su_offset); - *plen = *oxlen; + *oxlen = min_t(u64, len, su - su_offset); dout( obj extent %llu~%llu\n, *oxoff, *oxlen); return 0; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] libceph: drop osdc from ceph_calc_raw_layout()
The osdc parameter to ceph_calc_raw_layout() is not used, so get rid of it. Consequently, the corresponding parameter in calc_layout() becomes unused, so get rid of that as well. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |2 +- include/linux/ceph/osd_client.h |3 +-- net/ceph/osd_client.c | 10 -- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 4e44085..2d10504 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1170,7 +1170,7 @@ static int rbd_do_request(struct request *rq, osd_req-r_oid_len = strlen(osd_req-r_oid); rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id); - ret = ceph_calc_raw_layout(osdc, osd_req-r_file_layout, + ret = ceph_calc_raw_layout(osd_req-r_file_layout, ofs, len, bno, osd_req, ops); rbd_assert(ret == 0); diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 0e82a0a..fe3a6e8 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -207,8 +207,7 @@ extern void ceph_osdc_handle_reply(struct ceph_osd_client *osdc, extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc, struct ceph_msg *msg); -extern int ceph_calc_raw_layout(struct ceph_osd_client *osdc, - struct ceph_file_layout *layout, +extern int ceph_calc_raw_layout(struct ceph_file_layout *layout, u64 off, u64 *plen, u64 *bno, struct ceph_osd_request *req, struct ceph_osd_req_op *op); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index f844a35..baaec06 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -38,8 +38,7 @@ static int op_has_extent(int op) op == CEPH_OSD_OP_WRITE); } -int ceph_calc_raw_layout(struct ceph_osd_client *osdc, - struct ceph_file_layout *layout, +int ceph_calc_raw_layout(struct ceph_file_layout *layout, u64 off, u64 *plen, u64 *bno, struct ceph_osd_request *req, struct ceph_osd_req_op *op) @@ -99,8 +98,7 @@ EXPORT_SYMBOL(ceph_calc_raw_layout); * * fill osd op in request message. */ -static int calc_layout(struct ceph_osd_client *osdc, - struct ceph_vino vino, +static int calc_layout(struct ceph_vino vino, struct ceph_file_layout *layout, u64 off, u64 *plen, struct ceph_osd_request *req, @@ -109,7 +107,7 @@ static int calc_layout(struct ceph_osd_client *osdc, u64 bno; int r; - r = ceph_calc_raw_layout(osdc, layout, off, plen, bno, req, op); + r = ceph_calc_raw_layout(layout, off, plen, bno, req, op); if (r 0) return r; @@ -444,7 +442,7 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, return ERR_PTR(-ENOMEM); /* calculate max write size */ - r = calc_layout(osdc, vino, layout, off, plen, req, ops); + r = calc_layout(vino, layout, off, plen, req, ops); if (r 0) return ERR_PTR(r); req-r_file_layout = *layout; /* keep a copy */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] libceph: simplify ceph_osdc_alloc_request()
These two patches just move a couple of things that ceph_osdc_alloc_request() does out and into the caller. It simplifies the function slightly, and makes it possible for some callers to not have to supply irrelevant arguments. -Alex [PATCH 1/2] libceph: don't set flags in ceph_osdc_alloc_request() [PATCH 2/2] libceph: don't set pages or bio in ceph_osdc_alloc_request() -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] libceph: don't set flags in ceph_osdc_alloc_request()
The only thing ceph_osdc_alloc_request() really does with the flags value it is passed is assign it to the newly-created osd request structure. Do that in the caller instead. Both callers subsequently call ceph_osdc_build_request(), so have that function (instead of ceph_osdc_alloc_request()) issue a warning if a request comes through with neither the read nor write flags set. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |3 ++- include/linux/ceph/osd_client.h |1 - net/ceph/osd_client.c | 11 --- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 2d10504..b6b1522 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1150,13 +1150,14 @@ static int rbd_do_request(struct request *rq, (unsigned long long) len, coll, coll_index); osdc = rbd_dev-rbd_client-client-osdc; - osd_req = ceph_osdc_alloc_request(osdc, flags, snapc, ops, + osd_req = ceph_osdc_alloc_request(osdc, snapc, ops, false, GFP_NOIO, pages, bio); if (!osd_req) { ret = -ENOMEM; goto done_pages; } + osd_req-r_flags = flags; osd_req-r_callback = rbd_cb; rbd_req-rq = rq; diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index fe3a6e8..6ddda5b 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -213,7 +213,6 @@ extern int ceph_calc_raw_layout(struct ceph_file_layout *layout, struct ceph_osd_req_op *op); extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, - int flags, struct ceph_snap_context *snapc, struct ceph_osd_req_op *ops, bool use_mempool, diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index baaec06..3e82e61 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -163,7 +163,6 @@ static int get_num_ops(struct ceph_osd_req_op *ops) } struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, - int flags, struct ceph_snap_context *snapc, struct ceph_osd_req_op *ops, bool use_mempool, @@ -200,10 +199,6 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, INIT_LIST_HEAD(req-r_req_lru_item); INIT_LIST_HEAD(req-r_osd_item); - req-r_flags = flags; - - WARN_ON((flags (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0); - /* create reply message */ if (use_mempool) msg = ceph_msgpool_get(osdc-msgpool_op_reply, 0); @@ -339,6 +334,8 @@ void ceph_osdc_build_request(struct ceph_osd_request *req, u64 data_len = 0; int i; + WARN_ON((flags (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0); + head = msg-front.iov_base; head-snapid = cpu_to_le64(snap_id); op = (void *)(head + 1); @@ -434,12 +431,12 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, } else ops[1].op = 0; - req = ceph_osdc_alloc_request(osdc, flags, -snapc, ops, + req = ceph_osdc_alloc_request(osdc, snapc, ops, use_mempool, GFP_NOFS, NULL, NULL); if (!req) return ERR_PTR(-ENOMEM); + req-r_flags = flags; /* calculate max write size */ r = calc_layout(vino, layout, off, plen, req, ops); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] libceph: don't set pages or bio in ceph_osdc_alloc_request()
Only one of the two callers of ceph_osdc_alloc_request() provides page or bio data for its payload. And essentially all that function was doing with those arguments was assigning them to fields in the osd request structure. Simplify ceph_osdc_alloc_request() by having the caller take care of making those assignments Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |8 ++-- include/linux/ceph/osd_client.h |4 +--- net/ceph/osd_client.c | 15 ++- 3 files changed, 9 insertions(+), 18 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index b6b1522..bdb099c 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1150,14 +1150,18 @@ static int rbd_do_request(struct request *rq, (unsigned long long) len, coll, coll_index); osdc = rbd_dev-rbd_client-client-osdc; - osd_req = ceph_osdc_alloc_request(osdc, snapc, ops, - false, GFP_NOIO, pages, bio); + osd_req = ceph_osdc_alloc_request(osdc, snapc, ops, false, GFP_NOIO); if (!osd_req) { ret = -ENOMEM; goto done_pages; } osd_req-r_flags = flags; + osd_req-r_pages = pages; + if (bio) { + osd_req-r_bio = bio; + bio_get(osd_req-r_bio); + } osd_req-r_callback = rbd_cb; rbd_req-rq = rq; diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 6ddda5b..75f56d3 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -216,9 +216,7 @@ extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client * struct ceph_snap_context *snapc, struct ceph_osd_req_op *ops, bool use_mempool, - gfp_t gfp_flags, - struct page **pages, - struct bio *bio); + gfp_t gfp_flags); extern void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, u64 len, diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 3e82e61..5ed9c92 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -166,9 +166,7 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, struct ceph_snap_context *snapc, struct ceph_osd_req_op *ops, bool use_mempool, - gfp_t gfp_flags, - struct page **pages, - struct bio *bio) + gfp_t gfp_flags) { struct ceph_osd_request *req; struct ceph_msg *msg; @@ -229,13 +227,6 @@ struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc, memset(msg-front.iov_base, 0, msg-front.iov_len); req-r_request = msg; - req-r_pages = pages; -#ifdef CONFIG_BLOCK - if (bio) { - req-r_bio = bio; - bio_get(req-r_bio); - } -#endif return req; } @@ -431,9 +422,7 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc, } else ops[1].op = 0; - req = ceph_osdc_alloc_request(osdc, snapc, ops, -use_mempool, -GFP_NOFS, NULL, NULL); + req = ceph_osdc_alloc_request(osdc, snapc, ops, use_mempool, GFP_NOFS); if (!req) return ERR_PTR(-ENOMEM); req-r_flags = flags; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] rbd: disavow any support for multiple osd ops
The rbd code is rife with places where it seems that an osd request could support multiple osd ops. But the reality is that there are spots in rbd as well as libceph and the messenger that make such support impossible without some (upcoming, planned) additional work. This series starts by getting rid of the notion that anything but a single op will be passed for an osd operation. The first two patches just make it clear that we never actually do send more than one op from rbd anyway, the last two make the code reflect that, simplifying things in the process. -Alex [PATCH 1/4] rbd: pass num_op with ops array [PATCH 2/4] libceph: pass num_op with ops [PATCH 3/4] rbd: there is really only one op [PATCH 4/4] rbd: assume single op in a request -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rbd: kill ceph_osd_req_op-flags
The flags field of struct ceph_osd_req_op is never used, so just get rid of it. Signed-off-by: Alex Elder el...@inktank.com --- include/linux/ceph/osd_client.h |1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 2b04d05..69287cc 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -157,7 +157,6 @@ struct ceph_osd_client { struct ceph_osd_req_op { u16 op; /* CEPH_OSD_OP_* */ - u32 flags;/* CEPH_OSD_FLAG_* */ union { struct { u64 offset, length; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] rbd: stop using ceph_calc_raw_layout()
This series makes rbd no longer call ceph_calc_raw_layout(), and in doing so, also stop calling ceph_calc_file_object_mapping() for its requests. Apparently the call to the former was for the *other* side-effects it had (unrelated to the layout). -Alex [PATCH 1/4] rbd: pull in ceph_calc_raw_layout() [PATCH 2/4] rbd: open code rbd_calc_raw_layout() [PATCH 3/4] rbd: don't bother calculating file mapping [PATCH 4/4] rbd: use a common layout for each device -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] rbd: pull in ceph_calc_raw_layout()
This is the first in a series of patches aimed at eliminating the use of ceph_calc_raw_layout() by rbd. It simply pulls in a copy of that function and renames it rbd_calc_raw_layout(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index e1094ff..810b58d 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1103,6 +1103,40 @@ static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id) layout-fl_pg_pool = cpu_to_le32((u32) pool_id); } +int rbd_calc_raw_layout(struct ceph_file_layout *layout, + u64 off, u64 *plen, u64 *bno, + struct ceph_osd_request *req, + struct ceph_osd_req_op *op) +{ + u64 orig_len = *plen; + u64 objoff, objlen;/* extent in object */ + int r; + + /* object extent? */ + r = ceph_calc_file_object_mapping(layout, off, orig_len, bno, + objoff, objlen); + if (r 0) + return r; + if (objlen orig_len) { + *plen = objlen; + dout( skipping last %llu, final file extent %llu~%llu\n, +orig_len - *plen, off, *plen); + } + + if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) { + op-extent.offset = objoff; + op-extent.length = objlen; + } + req-r_num_pages = calc_pages_for(off, *plen); + req-r_page_alignment = off ~PAGE_MASK; + if (op-op == CEPH_OSD_OP_WRITE) + op-payload_len = *plen; + + dout(calc_layout bno=%llx %llu~%llu (%d pages)\n, +*bno, objoff, objlen, req-r_num_pages); + return 0; +} + /* * Send ceph osd request */ @@ -1169,7 +1203,7 @@ static int rbd_do_request(struct request *rq, osd_req-r_oid_len = strlen(osd_req-r_oid); rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id); - ret = ceph_calc_raw_layout(osd_req-r_file_layout, + ret = rbd_calc_raw_layout(osd_req-r_file_layout, ofs, len, bno, osd_req, op); rbd_assert(ret == 0); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] rbd: open code rbd_calc_raw_layout()
This patch gets rid of rbd_calc_raw_layout() by simply open coding it in its one caller. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 55 +-- 1 file changed, 18 insertions(+), 37 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 810b58d..1afe51f 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1034,7 +1034,7 @@ static struct ceph_osd_req_op *rbd_create_rw_op(int opcode, u32 payload_len) return NULL; /* * op extent offset and length will be set later on -* in calc_raw_layout() +* after ceph_calc_file_object_mapping(). */ op-op = opcode; op-payload_len = payload_len; @@ -1103,40 +1103,6 @@ static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id) layout-fl_pg_pool = cpu_to_le32((u32) pool_id); } -int rbd_calc_raw_layout(struct ceph_file_layout *layout, - u64 off, u64 *plen, u64 *bno, - struct ceph_osd_request *req, - struct ceph_osd_req_op *op) -{ - u64 orig_len = *plen; - u64 objoff, objlen;/* extent in object */ - int r; - - /* object extent? */ - r = ceph_calc_file_object_mapping(layout, off, orig_len, bno, - objoff, objlen); - if (r 0) - return r; - if (objlen orig_len) { - *plen = objlen; - dout( skipping last %llu, final file extent %llu~%llu\n, -orig_len - *plen, off, *plen); - } - - if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) { - op-extent.offset = objoff; - op-extent.length = objlen; - } - req-r_num_pages = calc_pages_for(off, *plen); - req-r_page_alignment = off ~PAGE_MASK; - if (op-op == CEPH_OSD_OP_WRITE) - op-payload_len = *plen; - - dout(calc_layout bno=%llx %llu~%llu (%d pages)\n, -*bno, objoff, objlen, req-r_num_pages); - return 0; -} - /* * Send ceph osd request */ @@ -1160,6 +1126,8 @@ static int rbd_do_request(struct request *rq, struct ceph_osd_request *osd_req; int ret; u64 bno; + u64 obj_off = 0; + u64 obj_len = 0; struct timespec mtime = CURRENT_TIME; struct rbd_request *rbd_req; struct ceph_osd_client *osdc; @@ -1203,9 +1171,22 @@ static int rbd_do_request(struct request *rq, osd_req-r_oid_len = strlen(osd_req-r_oid); rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id); - ret = rbd_calc_raw_layout(osd_req-r_file_layout, - ofs, len, bno, osd_req, op); + ret = ceph_calc_file_object_mapping(osd_req-r_file_layout, ofs, len, + bno, obj_off, obj_len); rbd_assert(ret == 0); + if (obj_len len) { + dout( skipping last %llu, final file extent %llu~%llu\n, +len - obj_len, ofs, obj_len); + len = obj_len; + } + if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) { + op-extent.offset = obj_off; + op-extent.length = obj_len; + if (op-op == CEPH_OSD_OP_WRITE) + op-payload_len = obj_len; + } + osd_req-r_num_pages = calc_pages_for(ofs, len); + osd_req-r_page_alignment = ofs ~PAGE_MASK; ceph_osdc_build_request(osd_req, ofs, len, 1, op, snapc, snapid, mtime); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] rbd: don't bother calculating file mapping
When rbd_do_request() has a request to process it initializes a ceph file layout structure and uses it to compute offsets and limits for the range of the request using ceph_calc_file_object_mapping(). The layout used is fixed, and is based on RBD_MAX_OBJ_ORDER (30). It sets the layout's object size and stripe unit to be 1 GB (2^30), and sets the stripe count to be 1. The job of ceph_calc_file_object_mapping() is to determine which of a sequence of objects will contain data covered by range, and within that object, at what offset the range starts. It also truncates the length of the range at the end of the selected object if necessary. This is needed for ceph fs, but for rbd it really serves no purpose. It does its own blocking of images into objects, echo of which is (1 obj_order) in size, and as a result it ignores the bno value returned by ceph_calc_file_object_mapping(). In addition, by the point a request has reached this function, it is already destined for a single rbd object, and its length will not exceed that object's extent. Because of this, and because the mapping will result in blocking up the range using an integer multiple of the image's object order, ceph_calc_file_object_mapping() will never change the offset or length values defined by the request. In other words, this call is a big no-op for rbd data requests. There is one exception. We read the header object using this function, and in that case we will not have already limited the request size. However, the header is a single object (not a file or rbd image), and should not be broken into pieces anyway. So in fact we should *not* be calling ceph_calc_file_object_mapping() when operating on the header object. So... Don't call ceph_calc_file_object_mapping() in rbd_do_request(), because useless for image data and incorrect to do sofor the image header. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 1afe51f..30a73ae 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1125,9 +1125,6 @@ static int rbd_do_request(struct request *rq, { struct ceph_osd_request *osd_req; int ret; - u64 bno; - u64 obj_off = 0; - u64 obj_len = 0; struct timespec mtime = CURRENT_TIME; struct rbd_request *rbd_req; struct ceph_osd_client *osdc; @@ -1171,19 +1168,12 @@ static int rbd_do_request(struct request *rq, osd_req-r_oid_len = strlen(osd_req-r_oid); rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id); - ret = ceph_calc_file_object_mapping(osd_req-r_file_layout, ofs, len, - bno, obj_off, obj_len); - rbd_assert(ret == 0); - if (obj_len len) { - dout( skipping last %llu, final file extent %llu~%llu\n, -len - obj_len, ofs, obj_len); - len = obj_len; - } + if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) { - op-extent.offset = obj_off; - op-extent.length = obj_len; + op-extent.offset = ofs; + op-extent.length = len; if (op-op == CEPH_OSD_OP_WRITE) - op-payload_len = obj_len; + op-payload_len = len; } osd_req-r_num_pages = calc_pages_for(ofs, len); osd_req-r_page_alignment = ofs ~PAGE_MASK; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] rbd: use a common layout for each device
Each osd message includes a layout structure, and for rbd it is always the same (at least for osd's in a given pool). Initialize a layout structure when an rbd_dev gets created and just copy that into osd requests for the rbd image. Replace an assertion that was done when initializing the layout structures with code that catches and handles anything that would trigger the assertion as soon as it is identified. This precludes that (bad) condition from ever occurring. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 34 +++--- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 30a73ae..fba0822 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -235,6 +235,8 @@ struct rbd_device { char*header_name; + struct ceph_file_layout layout; + struct ceph_osd_event *watch_event; struct ceph_osd_request *watch_request; @@ -1093,16 +1095,6 @@ static void rbd_coll_end_req(struct rbd_request *rbd_req, ret, len); } -static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id) -{ - memset(layout, 0, sizeof (*layout)); - layout-fl_stripe_unit = cpu_to_le32(1 RBD_MAX_OBJ_ORDER); - layout-fl_stripe_count = cpu_to_le32(1); - layout-fl_object_size = cpu_to_le32(1 RBD_MAX_OBJ_ORDER); - rbd_assert(pool_id = (u64) U32_MAX); - layout-fl_pg_pool = cpu_to_le32((u32) pool_id); -} - /* * Send ceph osd request */ @@ -1167,7 +1159,7 @@ static int rbd_do_request(struct request *rq, strncpy(osd_req-r_oid, object_name, sizeof(osd_req-r_oid)); osd_req-r_oid_len = strlen(osd_req-r_oid); - rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id); + osd_req-r_file_layout = rbd_dev-layout; /* struct */ if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) { op-extent.offset = ofs; @@ -2266,6 +2258,13 @@ struct rbd_device *rbd_dev_create(struct rbd_client *rbdc, rbd_dev-spec = spec; rbd_dev-rbd_client = rbdc; + /* Initialize the layout used for all rbd requests */ + + rbd_dev-layout.fl_stripe_unit = cpu_to_le32(1 RBD_MAX_OBJ_ORDER); + rbd_dev-layout.fl_stripe_count = cpu_to_le32(1); + rbd_dev-layout.fl_object_size = cpu_to_le32(1 RBD_MAX_OBJ_ORDER); + rbd_dev-layout.fl_pg_pool = cpu_to_le32((u32) spec-pool_id); + return rbd_dev; } @@ -2520,6 +2519,12 @@ static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev) if (parent_spec-pool_id == CEPH_NOPOOL) goto out; /* No parent? No problem. */ + /* The ceph file layout needs to fit pool id in 32 bits */ + + ret = -EIO; + if (WARN_ON(parent_spec-pool_id (u64) U32_MAX)) + goto out; + image_id = ceph_extract_encoded_string(p, end, NULL, GFP_KERNEL); if (IS_ERR(image_id)) { ret = PTR_ERR(image_id); @@ -3648,6 +3653,13 @@ static ssize_t rbd_add(struct bus_type *bus, if (spec-pool_id == CEPH_NOPOOL) goto err_out_client; + /* The ceph file layout needs to fit pool id in 32 bits */ + + if (WARN_ON(spec-pool_id (u64) U32_MAX)) { + rc = -EIO; + goto err_out_client; + } + rbd_dev = rbd_dev_create(rbdc, spec); if (!rbd_dev) goto err_out_client; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Small feature request for v0.55 release
I see that v0.55 will be the next stable release. Would it be possible to use standard tarball naming conventions for this release? If I download http://ceph.com/download/ceph-0.48.2.tar.bz2, the top level directory is actually ceph-0.48.2argonaut, not ceph-0.48.2 as expected. Downloading http://ceph.com/download/ceph-0.48.2argonaut.tar.bz2 yields a slightly more expected result, but still isn't the typical *ix style of name-version.tar. This is very annoying in some build systems, which have that assumption. I've actually been extracting the tarballs, renaming the top level directory, then recompressing them. It would be great if we didn't have to do that with the next release, e.g. extracting http://ceph.com/download/ceph-0.55.tar.bz2 would yield a top level directory of ceph-0.55. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small feature request for v0.55 release
On 14 Nov 2012, at 16:14, Sage Weil wrote: Appending the codename to the version string is something we did with argonaut (0.48argonaut) just to make it obvious to users which stable version they are on. How do people feel about that? Is it worthwhile? Useless? Ugly? We can certainly skip it for 0.55 bobtail… Just throwing in some thoughts, but how about a scheme like ${name}-stable-${version}.tar.bz2 and have the corresponding directory structure inside and just ditch code names in the tar ball filename? It doesn't look as nice with out a codename, but it makes it absolutely clear to new users that it is a stable release. Jimmy -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small feature request for v0.55 release
On Wed, Nov 14, 2012 at 1:53 PM, Nick Bartos n...@pistoncloud.com wrote: I see that v0.55 will be the next stable release. Would it be possible to use standard tarball naming conventions for this release? If I download http://ceph.com/download/ceph-0.48.2.tar.bz2, the top level directory is actually ceph-0.48.2argonaut, not ceph-0.48.2 as expected. Downloading http://ceph.com/download/ceph-0.48.2argonaut.tar.bz2 yields a slightly more expected result, but still isn't the typical *ix style of name-version.tar. This is very annoying in some build systems, which have that assumption. I've actually been extracting the tarballs, renaming the top level directory, then recompressing them. It would be great if we didn't have to do that with the next release, e.g. extracting http://ceph.com/download/ceph-0.55.tar.bz2 would yield a top level directory of ceph-0.55. +1 I'm glad to see I'm not the only one doing this. Gentoo's ebuild system doesn't respond kindly to this either. t. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small feature request for v0.55 release
On Wed, Nov 14, 2012 at 3:40 PM, Jimmy Tang jt...@tchpc.tcd.ie wrote: On 14 Nov 2012, at 16:14, Sage Weil wrote: Appending the codename to the version string is something we did with argonaut (0.48argonaut) just to make it obvious to users which stable version they are on. How do people feel about that? Is it worthwhile? Useless? Ugly? We can certainly skip it for 0.55 bobtail… Just throwing in some thoughts, but how about a scheme like ${name}-stable-${version}.tar.bz2 and have the corresponding directory structure inside and just ditch code names in the tar ball filename? It doesn't look as nice with out a codename, but it makes it absolutely clear to new users that it is a stable release. Personally, I'd prefer standard naming of ${name}-${version}.tar.bz2. You make it clear on your site which version is the LTS release, and which are the developer releases. t. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible
Hi Danny, Have you had a chance to work on this? I'd like to include this in bobtail. If you don't have time we can go ahead an implement it, but I'd like avoid duplicating effort. Thanks! sage On Fri, 2 Nov 2012, Danny Al-Gaaf wrote: Hi Sage, sorry for the late reply, was absent some weeks and busy with other issues. Am 17.08.2012 01:40, schrieb Sage Weil: On Thu, 16 Aug 2012, Tommi Virtanen wrote: On Thu, Aug 16, 2012 at 3:32 PM, Sage Weil s...@inktank.com wrote: As for the new options, I suggest: * osd fs type * osd fs devs (will work for mkcephfs, not for new stuff) * osd fs path * osd fs options What does osd_fs_path mean, and how is it different from the osd_data dir? The idea was that you might wand the fs mounted somewhere other that osd_data. I'm not sure it's useful; we may as well drop that... I'm expecting to need both mkfs-time options (btrfs metadata block size etc) and mount-time options (noatime etc). It would be nice if there was a way to set the options for all fstypes, and then just toggle which one is used (by default). That avoids bugs like trying to mkfs/mount btrfs with xfs-specific options, and vice versa. I'm not sure how well our config system will handle dynamic variable names -- ceph-authtool was fine with me just putting data in osd_crush_location, and we don't need to access these variables from C++, so it should be fine. If you really wanted to, you could probably cram the them into a single variable, with ad hoc structured data in the string value, but that's ugly.. Or just hardcode the list of possible filesystems, and then it's not dynamic variable names anymore. Yeah, ceph-conf will happily take anything. The C++ code has to do slightly more work to get arbitrary config fields, but that's not an issue. So I'm dreaming of something like: [osd] # what mount options will be passed when an osd data disk is using # one of these filesystems; these are passed to mount -o osd mount options btrfs = herp,foo=bar osd mount options xfs = noatime,derp # what mkfs options are used when creating new osd data disk # filesystems osd mkfs options btrfs = --hur osd mkfs options xfs = --dur # what fstype to use by default when mkfs'ing; mounting will detect # what's there (with blkid) and work with anything osd mkfs type = btrfs I will prepare a patch with these for the current mkcephfs and init-ceph incl. aliases for the old keys and cmdline options where possible. # this may go away with mkcephfs 2.0, and it will have to get more # complex if we provide something for journals too, etc, because you # may want to pair specific data disks to specific journals (DH has # this need).. haven't had time to think it through, which is why i'm # leaning toward and here's a hook where you run something on the # host that calls ceph-disk-prepare etc on all the disks you want, # and using uuids to match journals to data disks -- this work has # not yet started) osd fs devs = /dev/sdb /dev/sdc This all looks good to me. What do you think, Danny? This part (osd fs devs) is for a new mkcepfs.2.0 if I understand you correctly. Sounds okay for me atm. (Tommi: Are there any new information on this? Did you already start to work on this?) Danny -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: changed rbd cp behavior in 0.53
On 11/12/2012 02:47 PM, Josh Durgin wrote: On 11/12/2012 08:30 AM, Andrey Korolyov wrote: Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path is omitted. rbd cp install/img testimg rbd ls install img testimg Is this change permanent? Thanks! This is a regression. The previous behavior will be restored for 0.54. I added http://tracker.newdream.net/issues/3478 to track it. Actually, on detailed examination, it looks like this has been the behavior for a long time; I think the wiser course would be not to change this defaulting. One could argue the value of such defaulting, but it's also true that you can specify the source and destination pools explicitly. Andrey, any strong objection to leaving this the way it is? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Authorization issues in the 0.54
On Wed, Nov 14, 2012 at 4:20 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, In the 0.54 cephx is probably broken somehow: $ ceph auth add client.qemukvm osd 'allow *' mon 'allow *' mds 'allow *' -i qemukvm.key 2012-11-14 15:51:23.153910 7ff06441f780 -1 read 65 bytes from qemukvm.key added key for client.qemukvm $ ceph auth list ... client.admin key: [xx] caps: [mds] allow * Note that for mds you just specify 'allow' and not 'allow *'. It shouldn't affect the stuff that you're testing though. caps: [mon] allow * caps: [osd] allow * client.qemukvm key: [yy] caps: [mds] allow * caps: [mon] allow * caps: [osd] allow * ... $ virsh secret-set-value --secret uuid --base64 yy set username in the VM` xml... $ virsh start testvm kvm: -drive file=rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789,if=none,id=drive-virtio-disk0,format=raw: could not open disk image rbd:rbd/vm0:id=qemukvm:key=yy:auth_supported=cephx\;none:mon_host=192.168.10.125\:6789\;192.168.10.127\:6789\;192.168.10.129\:6789: Operation not permitted $ virsh secret-set-value --secret uuid --base64 xx set username again to admin for the VM` disk $ virsh start testvm Finally, vm started successfully. All rbd commands issued from cli works okay with the appropriate credentials, qemu binary was linked with same librbd as running one. Does anyone have a suggestion? There wasn't any change that I'm aware of that should make that happening. Can you reproduce it with 'debug ms = 1' and 'debug auth = 20'? Thanks, Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problem with ceph and btrfs patch: set journal_info in async trans commit worker
Hi, Stefan On wed, 14 Nov 2012 14:42:07 +0100, Stefan Priebe - Profihost AG wrote: Hello list, i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was seeing a massive performance degration. I see around 22x btrfs-endio-write processes every 10-20 seconds and they run a long time while consuming a massive amount of CPU. So my performance of 23.000 iops drops to an up and down of 23.000 iops to 0 - avg is now 2500 iops instead of 23.000. Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe Btrfs: set journal_info in async trans commit worker as the problematic patch. When i revert this one everything is fine again. Is this known? Could you try the following patch? http://marc.info/?l=linux-btrfsm=135175512030453w=2 I think the patch Btrfs: set journal_info in async trans commit worker is not the real reason that caused the regression. I guess it is caused by the bug of the reservation. When we join the same transaction handle more than 2 times, the pointer of the reservation in the transaction handle would be lost, and the statistical data in the reservation would be corrupted. And then we would trigger the space flush, which may block your tasks. Thanks Miao Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small feature request for v0.55 release
My personal preference would be for ${name}-${version}.tar.bz2 as well, but 2nd place would be ${name}-stable-${version}.tar.bz2. On Wed, Nov 14, 2012 at 3:47 PM, Tren Blackburn t...@eotnetworks.com wrote: On Wed, Nov 14, 2012 at 3:40 PM, Jimmy Tang jt...@tchpc.tcd.ie wrote: On 14 Nov 2012, at 16:14, Sage Weil wrote: Appending the codename to the version string is something we did with argonaut (0.48argonaut) just to make it obvious to users which stable version they are on. How do people feel about that? Is it worthwhile? Useless? Ugly? We can certainly skip it for 0.55 bobtail… Just throwing in some thoughts, but how about a scheme like ${name}-stable-${version}.tar.bz2 and have the corresponding directory structure inside and just ditch code names in the tar ball filename? It doesn't look as nice with out a codename, but it makes it absolutely clear to new users that it is a stable release. Personally, I'd prefer standard naming of ${name}-${version}.tar.bz2. You make it clear on your site which version is the LTS release, and which are the developer releases. t. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: changed rbd cp behavior in 0.53
On Thu, Nov 15, 2012 at 4:56 AM, Dan Mick dan.m...@inktank.com wrote: On 11/12/2012 02:47 PM, Josh Durgin wrote: On 11/12/2012 08:30 AM, Andrey Korolyov wrote: Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path is omitted. rbd cp install/img testimg rbd ls install img testimg Is this change permanent? Thanks! This is a regression. The previous behavior will be restored for 0.54. I added http://tracker.newdream.net/issues/3478 to track it. Actually, on detailed examination, it looks like this has been the behavior for a long time; I think the wiser course would be not to change this defaulting. One could argue the value of such defaulting, but it's also true that you can specify the source and destination pools explicitly. Andrey, any strong objection to leaving this the way it is? I`m not complaining - this behavior seems more logical in the first place and of course I use full path even doing something by hand. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OSD crash on 0.48.2argonaut
Dear All: I met this issue on one of osd node. Is this a known issue? Thanks! ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f08b112dcb0] 3: (gsignal()+0x35) [0x7f08afd09445] 4: (abort()+0x17b) [0x7f08afd0cbab] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f08b065769d] 6: (()+0xb5846) [0x7f08b0655846] 7: (()+0xb5873) [0x7f08b0655873] 8: (()+0xb596e) [0x7f08b065596e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1de) [0x7a82fe] 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x693) [0x530f83] 11: (ReplicatedPG::repop_ack(ReplicatedPG::RepGather*, int, int, int, eversion_t)+0x159) [0x531ac9] 12: (ReplicatedPG::sub_op_modify_reply(std::tr1::shared_ptrOpRequest)+0x15c) [0x53251c] 13: (ReplicatedPG::do_sub_op_reply(std::tr1::shared_ptrOpRequest)+0x81) [0x54d241] 14: (PG::do_request(std::tr1::shared_ptrOpRequest)+0x1e3) [0x600883] 15: (OSD::dequeue_op(PG*)+0x238) [0x5bfaf8] 16: (ThreadPool::worker()+0x4d5) [0x79f835] 17: (ThreadPool::WorkThread::entry()+0xd) [0x5d87cd] 18: (()+0x7e9a) [0x7f08b1125e9a] 19: (clone()+0x6d) [0x7f08afdc54bd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html