Re: ceph-standalone

2014-02-25 Thread Loic Dachary
On 25/02/2014 04:34, Ricardo Rocha wrote: I did it mostly to learn about docker (and because i wanted a 'cluster in the office' like Loic had :-)). Not incredibly pretty though. :-) Robert and Stephan created this nice office cluster and it would be nice to know how it evolved. My

Re: ceph-standalone

2014-02-25 Thread Robert Sander
On 25.02.2014 09:33, Loic Dachary wrote: On 25/02/2014 04:34, Ricardo Rocha wrote: I did it mostly to learn about docker (and because i wanted a 'cluster in the office' like Loic had :-)). Not incredibly pretty though. :-) Robert and Stephan created this nice office cluster and it would

Libvirt patches for Ubuntu 14.04

2014-02-25 Thread Wido den Hollander
Hi James, Ceph-dev, Some time ago you manually pushed a patch for libvirt to the 14.04 repositories, but this patch [0] now made it upstream and will be in libvirt 1.2.2. It's a patch for creating RBD images with format 2 by default. Another patch [1] of my just got accepted into upstream,

Re: [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: This is primarily for rbd's benefit and is supposed to combat fragmentation: ... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs

Re: [PATCH 4/6] rbd: do not hard-code CEPH_OSD_MAX_OP in rbd_osd_req_callback()

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: CEPH_OSD_MAX_OP value in rbd_osd_req_callback() is hard-coded to 2. Fix it. Please squash this in with the previous patch (at least). Change the BUG_ON() to rbd_assert() while you're

Re: [PATCH 0/6] libceph: CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:58 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: Hello, This series adds support for CEPH_OSD_OP_SETALLOCHINT osd op to libceph along with adjusting rbd to make use of it. The rationale and the basic desing was outlined in the

Re: [PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: In an effort to reduce fragmentation, prefix every rbd write with a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set to the object size (1 order). Backwards

Re: [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Alex Elder
On 02/25/2014 06:52 AM, Ilya Dryomov wrote: On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: This is primarily for rbd's benefit and is supposed to combat fragmentation: ... knowing that rbd images have a 4m size, librbd can pass a

Re: [PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Alex Elder
On 02/25/2014 06:58 AM, Ilya Dryomov wrote: On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: In an effort to reduce fragmentation, prefix every rbd write with a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set

Re: [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Tue, Feb 25, 2014 at 3:05 PM, Alex Elder el...@ieee.org wrote: The other thing is that the expected size is limited by rbd_image_header-obj_order, which is a single byte. I think you should encode this the same way. Even if the hint were for more than RBD, this level of granularity may

Assertion error in librados

2014-02-25 Thread Filippos Giannakos
Hello all, We recently bumped into the following assertion error in librados on our production service: common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fa2c2ccf700 time 2014-02-21 07:23:26.340791 common/Mutex.cc: 93: FAILED assert(r == 0) ceph version 0.72.2

Re: Assertion error in librados

2014-02-25 Thread Gregory Farnum
Do you have logs? The assert indicates that the messenger got back something other than okay when trying to grab a local Mutex, which shouldn't be able to happen. It may be that some error-handling path didn't drop it (within the same thread that later tried to grab it again), but we'll need more

[PATCH v2 2/5] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
This is primarily for rbd's benefit and is supposed to combat fragmentation: ... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd

[PATCH v2 4/5] rbd: num_ops parameter for rbd_osd_req_create()

2014-02-25 Thread Ilya Dryomov
In preparation for prefixing rbd writes with an allocation hint introduce a num_ops parameter for rbd_osd_req_create(). The rationale is that not every write request is a write op that needs to be prefixed (e.g. watch op), so the num_ops logic needs to be in the callers. Signed-off-by: Ilya

[PATCH v2 0/5] libceph: CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
Hello, This is a v2 of the CEPH_OSD_OP_SETALLOCHINT osd op series. Incorporated are Sage's and Alex's comments. The rationale and the basic design is outlined in the rados io hints thread on ceph-devel. Discussion in the wip-hint thread is still on-going, but where it's headed is unclear. This

[PATCH v2 1/5] libceph: encode CEPH_OSD_OP_FLAG_* op flags

2014-02-25 Thread Ilya Dryomov
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Sage Weil s...@inktank.com Reviewed-by: Alex Elder el...@linaro.org --- include/linux/ceph/osd_client.h |1 + include/linux/ceph/rados.h |2 +-

Re: Assertion error in librados

2014-02-25 Thread Yehuda Sadeh
Looks to me like we try to send a message in the handle_osd_map when we are still under the lock that we try to grab. Yehuda On Tue, Feb 25, 2014 at 7:28 AM, Gregory Farnum g...@inktank.com wrote: Do you have logs? The assert indicates that the messenger got back something other than okay when

Re: Assertion error in librados

2014-02-25 Thread Noah Watkins
Perhaps using gtest-style asserts (ASSERT_EQ(r, 0)) in Ceph would be useful so we can see parameter values to the assertion in the log. In this case, the return value from pthread_mutex_lock is almost certainly EINVAL, but it'd be informative to know for sure. On Tue, Feb 25, 2014 at 7:58 AM,

Re: Fwd: wip-hint

2014-02-25 Thread Sage Weil
On Sun, 23 Feb 2014, Yehuda Sadeh wrote: On Sun, Feb 23, 2014 at 11:25 AM, Sage Weil s...@inktank.com wrote: On Sun, 23 Feb 2014, Yehuda Sadeh wrote: (resending to get through the list) My main concern is that the whole notion of hint needs to be abstracted. Instead of having a specific op

Re: Assertion error in librados

2014-02-25 Thread Gregory Farnum
Nope; it's an entirely local problem. I'm kind of surprised there wasn't more logging available in the same location you got the core dump, but it's possible the log generation is turned off (in addition to the log dumping). The Dispatch lock and the messenger lock are distinct, Yehuda, and the

Re: Assertion error in librados

2014-02-25 Thread Josh Durgin
That's a good idea. This particular assert in a Mutex is almost always a use-after-free of the Mutex or structure containing it though. On 02/25/2014 09:33 AM, Noah Watkins wrote: Perhaps using gtest-style asserts (ASSERT_EQ(r, 0)) in Ceph would be useful so we can see parameter values to the

Re: Fwd: wip-hint

2014-02-25 Thread Yehuda Sadeh
This was discussed off list, but here's my take: On Tue, Feb 25, 2014 at 9:38 AM, Sage Weil s...@inktank.com wrote: On Sun, 23 Feb 2014, Yehuda Sadeh wrote: On Sun, Feb 23, 2014 at 11:25 AM, Sage Weil s...@inktank.com wrote: On Sun, 23 Feb 2014, Yehuda Sadeh wrote: (resending to get through

CDS Giant Schedule

2014-02-25 Thread Patrick McGarry
Greetings! Just wanted to let people know that the schedule has been published for the next Ceph Developer Summit (March 04-05, 2014): https://wiki.ceph.com/Planning/CDS/CDS_Giant_(Mar_2014) There may still be a few last minute tweaks, but for the most part that should be what we're working

Re: Assertion error in librados

2014-02-25 Thread Noah Watkins
On Tue, Feb 25, 2014 at 9:51 AM, Josh Durgin josh.dur...@inktank.com wrote: That's a good idea. This particular assert in a Mutex is almost always a use-after-free of the Mutex or structure containing it though. I think that a use-after-free will also throw an EINVAL (assuming it isn't a

Re: [ceph-users] PG folder hierarchy

2014-02-25 Thread Gregory Farnum
On Tue, Feb 25, 2014 at 7:13 PM, Guang yguan...@yahoo.com wrote: Hello, Most recently when looking at PG's folder splitting, I found that there was only one sub folder in the top 3 / 4 levels and start having 16 sub folders starting from level 6, what is the design consideration behind this?