Re: [PATCHv5] rbd block driver fix race between aio completition and aio cancel

2012-11-30 Thread Stefan Hajnoczi
On Thu, Nov 29, 2012 at 10:37 PM, Stefan Priebe s.pri...@profihost.ag wrote: @@ -568,6 +562,10 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb) { RBDAIOCB *acb = (RBDAIOCB *) blockacb; acb-cancelled = 1; + +while (acb-status == -EINPROGRESS) { +

[PATCHv6] rbd block driver fix race between aio completition and aio cancel

2012-11-30 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Changes since PATCHv5: -

Re: [PATCHv5] rbd block driver fix race between aio completition and aio cancel

2012-11-30 Thread Stefan Priebe - Profihost AG
fixed in V6 Am 30.11.2012 09:26, schrieb Stefan Hajnoczi: On Thu, Nov 29, 2012 at 10:37 PM, Stefan Priebe s.pri...@profihost.ag wrote: @@ -568,6 +562,10 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb) { RBDAIOCB *acb = (RBDAIOCB *) blockacb; acb-cancelled = 1; + +

Re: Hangup during scrubbing - possible solutions

2012-11-30 Thread Andrey Korolyov
http://xdel.ru/downloads/ceph-log/ceph-scrub-stuck.log.gz http://xdel.ru/downloads/ceph-log/cluster-w.log.gz Here, please. I have initiated a deep-scrub of osd.1 which was lead to forever-stuck I/O requests in a short time(scrub `ll do the same). Second log may be useful for proper timestamps,

Re: [PATCHv6] rbd block driver fix race between aio completition and aio cancel

2012-11-30 Thread Stefan Hajnoczi
On Fri, Nov 30, 2012 at 9:55 AM, Stefan Priebe s.pri...@profihost.ag wrote: This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces

[PATCH 0/2] rbd: fix two memory leaks

2012-11-30 Thread Alex Elder
This series fixes two memory leaks that occur whenever a special (non I/O) osd request in rbd. -Alex [PATCH 1/2] rbd: don't leak rbd_req on synchronous requests [PATCH 2/2] rbd: don't leak rbd_req for rbd_req_sync_notify_ack() -- To unsubscribe from this

[PATCH 1/2] rbd: don't leak rbd_req on synchronous requests

2012-11-30 Thread Alex Elder
When rbd_do_request() is called it allocates and populates an rbd_req structure to hold information about the osd request to be sent. This is done for the benefit of the callback function (in particular, rbd_req_cb()), which uses this in processing when the request completes. Synchronous

[PATCH 2/2] rbd: don't leak rbd_req for rbd_req_sync_notify_ack()

2012-11-30 Thread Alex Elder
When rbd_req_sync_notify_ack() calls rbd_do_request() it supplies rbd_simple_req_cb() as its callback function. Because the callback is supplied, an rbd_req structure gets allocated and populated so it can be used by the callback. However rbd_simple_req_cb() is not freeing (or even using) the

[PATCH] libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

2012-11-30 Thread Jim Schutt
Add libceph support for a new CRUSH tunable recently added to Ceph servers. Consider the CRUSH rule step chooseleaf firstn 0 type node_type This rule means that n replicas will be chosen in a manner such that each chosen leaf's branch will contain a unique instance of node_type. When an

Re: OSD daemon changes port no

2012-11-30 Thread Sage Weil
What kernel version and mds version are you running? I did # ceph osd pool create foo 12 # ceph osd pool create bar 12 # ceph mds add_data_pool 3 # ceph mds add_data_pool 4 and from a kernel mount # mkdir foo # mkdir bar # cephfs foo set_layout --pool 3 # cephfs bar set_layout --pool 4 #

Re: rbd map command hangs for 15 minutes during system start up

2012-11-30 Thread Nick Bartos
My initial tests using a 3.5.7 kernel with the 55 patches from wip-nick are going well. So far I've gone through 8 installs without an incident, I'll leave it run for a bit longer to see if it crops up again. Can I get a branch with these patches integrated into all of the backported patches to

Re: rbd map command hangs for 15 minutes during system start up

2012-11-30 Thread Alex Elder
On 11/30/2012 12:49 PM, Nick Bartos wrote: My initial tests using a 3.5.7 kernel with the 55 patches from wip-nick are going well. So far I've gone through 8 installs without an incident, I'll leave it run for a bit longer to see if it crops up again. This is great news! Now I wonder which

Re: Hangup during scrubbing - possible solutions

2012-11-30 Thread Samuel Just
Hah! Thanks for the log, it's our handling of active_pushes. I'll have a patch shortly. Thanks! -Sam On Fri, Nov 30, 2012 at 4:14 AM, Andrey Korolyov and...@xdel.ru wrote: http://xdel.ru/downloads/ceph-log/ceph-scrub-stuck.log.gz http://xdel.ru/downloads/ceph-log/cluster-w.log.gz Here,

Review request: wip-localized-read-tests

2012-11-30 Thread Noah Watkins
I've pushed up patches for the first phase of testing read from replica functionality, which looks only at objecter/client level ops: wip-localized-read-tests The major points are: 1. Run libcephfs tests w/ and w/o localized reads enabled 2. Add the performance counter in Objecter to

Re: rbd map command hangs for 15 minutes during system start up

2012-11-30 Thread Alex Elder
On 11/29/2012 02:37 PM, Alex Elder wrote: On 11/22/2012 12:04 PM, Nick Bartos wrote: Here are the ceph log messages (including the libceph kernel debug stuff you asked for) from a node boot with the rbd command hung for a couple of minutes: I'm sorry, but I did something stupid... Yes, the

librbd: error finding header: (2) No such file or directory

2012-11-30 Thread Simon Frerichs | Fremaks GmbH
Hi, we war starting to see this error on some images: - rbd info kvm1207 error opening image kvm1207: (2) No such file or directory 2012-12-01 02:58:27.556677 7ffd50c60760 -1 librbd: error finding header: (2) No such file or directory Anyway to fix these images? Best regards, Simon -- To

Re: endless flying slow requests

2012-11-30 Thread Samuel Just
I've pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7. Let me know if the problem persists with this patch. -Sam On Wed, Nov 28, 2012 at 2:04 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Nov 29, 2012 at 1:12 AM, Samuel Just sam.j...@inktank.com wrote: Also, these clusters

Re: Hangup during scrubbing - possible solutions

2012-11-30 Thread Samuel Just
Just pushed a fix to next, 49f32cee647c5bd09f36ba7c9fd4f481a697b9d7. Let me know if it persists. Thanks for the logs! -Sam On Fri, Nov 30, 2012 at 2:04 PM, Samuel Just sam.j...@inktank.com wrote: Hah! Thanks for the log, it's our handling of active_pushes. I'll have a patch shortly.