[PATCH 0/7] rbd: implement single-major device number allocation scheme

2013-12-16 Thread Ilya Dryomov
at once. See individual commit messages for details. Fixes: http://tracker.ceph.com/issues/5048 Thanks, Ilya Ilya Dryomov (7): rbd: rbd_device::dev_id is an int, format it as such rbd: tweak loaded message and module description rbd: refactor rbd_init() a bit rbd: switch

[PATCH 2/7] rbd: tweak loaded message and module description

2013-12-16 Thread Ilya Dryomov
Tweak loaded message, so that it looks like [ 30.184235] rbd: loaded instead of [ 38.056564] rbd: loaded rbd (rados block device) Also move (and slightly tweak) MODULE_DESCRIPTION so that all authors are next to each other in modinfo output. Signed-off-by: Ilya Dryomov ilya.dryo

[PATCH 5/7] rbd: add 'minor' sysfs rbd device attribute

2013-12-16 Thread Ilya Dryomov
Introduce /sys/bus/rbd/devices/id/minor sysfs attribute for exporting rbd whole disk minor numbers. This is a step towards single-major device number allocation scheme, but also a good thing on its own. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Alex Elder el...@linaro.org

[PATCH 4/7] rbd: switch to ida for rbd id assignments

2013-12-16 Thread Ilya Dryomov
completely unpredictable. So, in preparation for single-major device number allocation scheme, which is going to establish and rely on a constant mapping between rbd ids and device numbers, switch to ida for rbd id assignments. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Alex Elder

[PATCH 1/7] rbd: rbd_device::dev_id is an int, format it as such

2013-12-16 Thread Ilya Dryomov
rbd_device::dev_id is an int, format it as such. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Alex Elder el...@linaro.org Reviewed-by: Josh Durgin josh.dur...@inktank.com --- drivers/block/rbd.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git

[PATCH 6/7] rbd: wire up is_visible() sysfs callback for rbd bus

2013-12-16 Thread Ilya Dryomov
In preparation for single-major device number allocation scheme, wire up attribute_group::is_visible() callback for rbd bus. This allows us to make the new single-major attributes conditional. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Alex Elder el...@linaro.org Reviewed

[PATCH] rbd: enable extended devt in single-major mode

2013-12-16 Thread Ilya Dryomov
... 259 23 0 rbd1p39 259 24 0 rbd1p40 251 32 1024 rbd2 251 33 0 rbd2p1 251 34 0 rbd2p2 (major 251 was assigned dynamically at module load time) Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Josh Durgin

[PATCH 0/2] rbd: tear down watch request if rbd_dev_device_setup() fails

2013-12-16 Thread Ilya Dryomov
, Ilya Ilya Dryomov (2): rbd: introduce rbd_dev_header_unwatch_sync() and switch to it rbd: tear down watch request if rbd_dev_device_setup() fails drivers/block/rbd.c | 41 - 1 file changed, 28 insertions(+), 13 deletions

[PATCH 1/2] rbd: introduce rbd_dev_header_unwatch_sync() and switch to it

2013-12-16 Thread Ilya Dryomov
Rename rbd_dev_header_watch_sync() to __rbd_dev_header_watch_sync() and introduce two helpers: rbd_dev_header_{,un}watch_sync() to make it more clear what is going on. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c | 35 ++- 1 file

[PATCH 2/2] rbd: tear down watch request if rbd_dev_device_setup() fails

2013-12-16 Thread Ilya Dryomov
Tear down watch request if rbd_dev_device_setup() fails. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index c91108b760cf..e709f4ae117f 100644 --- a/drivers/block

Re: New Defects reported by Coverity Scan for ceph (fwd)

2013-12-17 Thread Ilya Dryomov
On Mon, Dec 16, 2013 at 6:07 PM, Sage Weil s...@inktank.com wrote: -- Forwarded message -- From: scan-ad...@coverity.com To: undisclosed-recipients:; Cc: Date: Mon, 16 Dec 2013 00:57:57 -0800 Subject: New Defects reported by Coverity Scan for ceph Hi, Please find the

Re: gitbuilder update request

2013-12-17 Thread Ilya Dryomov
Whoever is going to be doing this should probably make sure that https://github.com/ceph/autobuild-ceph/pull/5 is merged before rolling stuff out. (I have already installed libblkid-dev on pretty much all gitbuilders by hand, but it wouldn't hurt to capture both pytest and libblkid-dev in a

[PATCH 04/19] crush: fix some comments

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit 3cef755428761f2481b1dd0e0fbd0464ac483fc5. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index 71192b1f8501

[PATCH 06/19] crush: return CRUSH_ITEM_UNDEF for failed placements with indep

2013-12-23 Thread Ilya Dryomov
commit b1d4dd4eb044875874a1d01c01c7d766db5d0a80. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |3 ++- net/ceph/crush/mapper.c |8 ++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/crush/crush.h b/include/linux

[PATCH 12/19] crush: new SET_CHOOSE_LEAF_TRIES command

2013-12-23 Thread Ilya Dryomov
the rep pools). (We should do the same for the other tunables, by the way!) Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |2 ++ net/ceph/crush/mapper.c | 31

[PATCH 08/19] crush: add note about r in recursive choose

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit 4551fee9ad89d0427ed865d766d0d44004d3e3e1. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c |8 1 file changed, 8 insertions(+) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index caeb1066bea3..77b7a73e65cf

[PATCH 09/19] crush: strip firstn conditionals out of crush_choose, rename

2013-12-23 Thread Ilya Dryomov
mode. This appears to have happened wy back in commit dae8bec9 (or earlier)... 2007. Reflects ceph.git commit 94350996cb2035850bcbece6a77a9b0394177ec9. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c | 88 ++- 1 file

[PATCH 05/19] crush: eliminate CRUSH_MAX_SET result size limitation

2013-12-23 Thread Ilya Dryomov
This is only present to size the temporary scratch arrays that we put on the stack. Let the caller allocate them as they wish and remove the limitation. Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux

[PATCH 07/19] crush: use breadth-first search for indep mode

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit 86e978036a4ecbac4c875e7c00f6c5bbe37282d3. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |3 +- net/ceph/crush/mapper.c | 172 --- 2 files changed, 165 insertions(+), 10 deletions

[PATCH 11/19] crush: pass parent r value for indep call

2013-12-23 Thread Ilya Dryomov
-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index 125dbd04f2b6..c727836b5860 100644 --- a/net/ceph/crush/mapper.c +++ b/net/ceph/crush

[PATCH 01/19] crush: pass weight vector size to map function

2013-12-23 Thread Ilya Dryomov
tolerate previous bad osdmaps that got into this state. It's also a bit more defensive. Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/mapper.h |2 +- net/ceph/crush/mapper.c | 17

[PATCH 00/19] crush kernel update

2013-12-23 Thread Ilya Dryomov
introduced by crush/mapper: finish adding choose_local_[fallback_]tries Available from wip-crush-2 branch of ceph-client.git. Thanks, Ilya Ilya Dryomov (19): crush: pass weight vector size to map function crush: factor out (trivial) crush_destroy_rule() crush: reduce

[PATCH 03/19] crush: reduce scope of some local variables

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit e7d47827f0333c96ad43d257607fb92ed4176550. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index 18d2cf66f102

[PATCH 16/19] crush: generalize descend_once

2013-12-23 Thread Ilya Dryomov
with indep mode. Reflects ceph.git commit 685c6950ef3df325ef04ce7c986e36ca2514c5f1. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph

[PATCH 13/19] crush: apply chooseleaf_tries to firstn mode too

2013-12-23 Thread Ilya Dryomov
. In contrast, for indep, if tries is not specified we default to 1 recursive attempt, because that is simply more sane, and we have the option to do so. The descend_once tunable has no effect for indep. Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63. Signed-off-by: Ilya Dryomov

[PATCH 14/19] crush: add SET_CHOOSE_TRIES rule step

2013-12-23 Thread Ilya Dryomov
Since we can specify the recursive retries in a rule, we may as well also specify the non-recursive tries too for completeness. Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |3 ++- net

[PATCH 15/19] crush: CHOOSE_LEAF - CHOOSELEAF throughout

2013-12-23 Thread Ilya Dryomov
This aligns the internal identifier names with the user-visible names in the decompiled crush map language. Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |6 +++--- net/ceph/crush

[PATCH 18/19] crush: attempts - tries

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit ea3a0bb8b773360d73b8b77fa32115ef091c9857. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index

[PATCH 19/19] crush: fix crush_choose_firstn comment

2013-12-23 Thread Ilya Dryomov
Reflects ceph.git commit 8b38f10bc2ee3643a33ea5f9545ad5c00e4ac5b4. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c index 71ce4f12a7c9

[PATCH 17/19] crush: add set_choose_local_[fallback_]tries steps

2013-12-23 Thread Ilya Dryomov
This allows all of the tunables to be overridden by a specific rule. Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292, 0497db49e5973b50df26251ed0e3f4ac7578e66e. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h

[PATCH 0/2] ceph_features.h kernel update

2013-12-24 Thread Ilya Dryomov
, Ilya Ilya Dryomov (2): libceph: all features fields must be u64 libceph: update ceph_features.h fs/ceph/mds_client.c | 14 +++--- fs/ceph/super.c|4 +- include/linux/ceph/ceph_features.h | 96 include/linux/ceph

[PATCH 1/2] libceph: all features fields must be u64

2013-12-24 Thread Ilya Dryomov
In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- fs/ceph/mds_client.c | 14 +++--- fs/ceph/super.c|4

[PATCH 2/2] libceph: update ceph_features.h

2013-12-24 Thread Ilya Dryomov
commit 053659d05e0349053ef703b414f44965f368b9f0. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h | 96 net/ceph/messenger.c |4 +- 2 files changed, 67 insertions(+), 33 deletions(-) diff --git

[PATCH] crush: support new indep mode and SET_* steps (crush v2) by default

2013-12-24 Thread Ilya Dryomov
Add CRUSH_V2 feature (new indep mode and SET_* steps) to a set of features supported by default. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph/ceph_features.h

[PATCH] libceph: use CEPH_MON_PORT when the specified port is 0

2013-12-31 Thread Ilya Dryomov
Similar to userspace, don't bail with parse_ips bad ip ... if the specified port is port 0, instead use port CEPH_MON_PORT (6789, the default monitor port). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/messenger.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion

Re: Ceph fixes for 3.10.y

2014-01-07 Thread Ilya Dryomov
On Tue, Jan 7, 2014 at 12:09 AM, Greg KH g...@kroah.com wrote: On Tue, Dec 31, 2013 at 08:21:19AM -0800, Sage Weil wrote: Hi Greg, This is a somewhat long overdue set of fixes for 3.10.y. Since there are a lot of patches, they can be pulled from

[PATCH 0/3] libceph: #5425 fix

2014-01-13 Thread Ilya Dryomov
of xfstests in a configuration that previously took ~5 hours to make it pop up and should probably be committed w/o waiting for part 2. Thanks, Ilya Ilya Dryomov (3): libceph: rename ceph_msg::front_max to front_alloc_len libceph: rename front to front_len in get_reply

[PATCH 3/3] libceph: fix preallocation check in get_reply()

2014-01-13 Thread Ilya Dryomov
another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer. Fixes: http://tracker.ceph.com/issues/5425 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph

[PATCH] libceph: add ceph_kv{malloc,free}() and switch to them

2014-01-14 Thread Ilya Dryomov
: - for buffers (ceph_buffer_new()), from trying to kmalloc() everything and using vmalloc() just as a fallback - for messages (ceph_msg_new()), from going to vmalloc() for anything bigger than a page - for messages (ceph_msg_new()), from disallowing vmalloc() to use high memory Signed-off-by: Ilya

Re: v0.75 released

2014-01-16 Thread Ilya Dryomov
On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil s...@inktank.com wrote: [...] * rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov) Just a note, v0.75 simply adds some of the infrastructure, the actual support for this will arrive with kernel 3.14. The theoretical limit is 65536

Re: [ceph-users] v0.75 released

2014-01-17 Thread Ilya Dryomov
On Fri, Jan 17, 2014 at 11:20 AM, Ilya Dryomov ilya.dryo...@inktank.com wrote: On Fri, Jan 17, 2014 at 2:05 AM, Christian Balzer ch...@gol.com wrote: On Thu, 16 Jan 2014 15:51:17 +0200 Ilya Dryomov wrote: On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil s...@inktank.com wrote: [...] * rbd

[PATCH] libceph: dout() is missing a newline

2014-01-17 Thread Ilya Dryomov
Add a missing newline to dout() in __reset_osd(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osd_client.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 733195170490..959d332ed534 100644

[PATCH 2/3] libceph: introduce con_fault_raise() and switch to it

2014-01-17 Thread Ilya Dryomov
In preparation for connect timeout abstract ceph_connection fault-initiating logic into a separate function and start using it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/messenger.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net

[PATCH 3/3] libceph: handle dead tcp connections during connection negotiation

2014-01-17 Thread Ilya Dryomov
-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/messenger.h |5 + net/ceph/ceph_common.c |1 + net/ceph/messenger.c | 31 +++ 3 files changed, 37 insertions(+) diff --git a/include/linux/ceph/messenger.h b/include/linux

[PATCH 0/3] libceph: #7139 fix

2014-01-17 Thread Ilya Dryomov
Hello, This series fixes #7139 (Linux kernel client, dead tcp connections during connection negotiation may lead to hangs). 1/2 and 2/2 are preparatory patches, 3/3 is the actual fix. Thanks, Ilya Ilya Dryomov (3): libceph: add function names to timeout dout()s libceph

Re: [PATCH 0/3] libceph: #7139 fix

2014-01-20 Thread Ilya Dryomov
On Fri, Jan 17, 2014 at 3:28 PM, Ilya Dryomov ilya.dryo...@inktank.com wrote: Hello, This series fixes #7139 (Linux kernel client, dead tcp connections during connection negotiation may lead to hangs). 1/2 and 2/2 are preparatory patches, 3/3 is the actual fix. Please ignore this, I'm going

Re: CEPH_FEATURE_OSD_CACHEPOOL always required when osd cache pool exist ?

2014-01-21 Thread Ilya Dryomov
On Tue, Jan 21, 2014 at 10:28 AM, Laurent Barbe laur...@ksperis.com wrote: Hi all, About 0.75, when I created cache pool, kernel client like rbd or cephfs make error feature set mismatch. (Even if the device is not on the cached pool) $ rbd map rbd/myrbd= OK $ ceph osd pool create

[PATCH v2 0/3] libceph: #7139 fix

2014-01-21 Thread Ilya Dryomov
in con_close_socket() (3/3) Thanks, Ilya Ilya Dryomov (3): libceph: add function names to timeout dout()s libceph: introduce con_fault_raise() and switch to it libceph: handle dead tcp connections during connection negotiation include/linux/ceph/messenger.h |5 net/ceph

[PATCH v2 2/3] libceph: introduce con_fault_raise() and switch to it

2014-01-21 Thread Ilya Dryomov
In preparation for connect timeout abstract ceph_connection fault-initiating logic into a separate function and start using it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/messenger.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ceph

[PATCH v2 3/3] libceph: handle dead tcp connections during connection negotiation

2014-01-21 Thread Ilya Dryomov
-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/messenger.h |5 + net/ceph/ceph_common.c |1 + net/ceph/messenger.c | 42 ++-- 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/ceph

Re: [PATCH v2 3/3] libceph: handle dead tcp connections during connection negotiation

2014-01-26 Thread Ilya Dryomov
On Sat, Jan 25, 2014 at 1:07 AM, Sage Weil s...@inktank.com wrote: On Tue, 21 Jan 2014, Ilya Dryomov wrote: Keepalive mechanism that we are currently using doesn't handle dead (e.g. half-open in RFC 793 sense) TCP connections: a) it's based on pending ceph_osd_requests which

Re: [PATCH v2 3/3] libceph: handle dead tcp connections during connection negotiation

2014-01-26 Thread Ilya Dryomov
I've looked into this some more and I now think connect_timeout approach not the best way to fix this. I think a better way is to go through all connection and negotiation waits on both rbd and cephfs sides and make sure they are interruptible and have a timeout attached. Only if that turns out

[PATCH 02/11] libceph: move ceph_file_layout helpers to ceph_fs.h

2014-01-27 Thread Ilya Dryomov
Move ceph_file_layout helper macros and inline functions to ceph_fs.h. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_fs.h | 23 +++ include/linux/ceph/osdmap.h | 27 --- 2 files changed, 23 insertions(+), 27

[PATCH 03/11] libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN

2014-01-27 Thread Ilya Dryomov
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c |6 +++--- include/linux/ceph/osd_client.h |4 ++-- net/ceph/osd_client.c |2 +- 3 files

[PATCH 00/11] tiering support

2014-01-27 Thread Ilya Dryomov
Ilya Dryomov (11): libceph: start using oloc abstraction libceph: move ceph_file_layout helpers to ceph_fs.h libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN libceph: introduce and start using oid abstraction libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg() libceph

[PATCH 06/11] libceph: CEPH_OSD_FLAG_* enum update

2014-01-27 Thread Ilya Dryomov
Update CEPH_OSD_FLAG_* enum. (We need CEPH_OSD_FLAG_IGNORE_OVERLAY to support tiering). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/rados.h |4 1 file changed, 4 insertions(+) diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h index

[PATCH 09/11] libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}

2014-01-27 Thread Ilya Dryomov
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c |8 include/linux/ceph/osd_client.h |4 ++-- net/ceph/debugfs.c

[PATCH 07/11] libceph: add ceph_pg_pool_by_id()

2014-01-27 Thread Ilya Dryomov
Lookup pool info by ID function is hidden in osdmap.c. Expose it to the rest of libceph. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |3 +++ net/ceph/osdmap.c |5 + 2 files changed, 8 insertions(+) diff --git a/include/linux/ceph

[PATCH 05/11] libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()

2014-01-27 Thread Ilya Dryomov
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- fs/ceph/ioctl.c |8 ++-- include/linux/ceph/osdmap.h |7 +-- net/ceph

[PATCH 11/11] libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature

2014-01-27 Thread Ilya Dryomov
Announce our (limited, see previous commit) support for CACHEPOOL feature. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h |1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h

[PATCH 08/11] libceph: follow {read,write}_tier fields on osd request submission

2014-01-27 Thread Ilya Dryomov
}_tier are ignored. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |2 ++ net/ceph/osd_client.c | 30 -- net/ceph/osdmap.c | 28 +--- 3 files changed, 55 insertions(+), 5 deletions

[PATCH 10/11] libceph: follow redirect replies from osds

2014-01-27 Thread Ilya Dryomov
doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with corrupt osd_op_reply ... message. Signed-off-by: Ilya

[PATCH 01/11] libceph: start using oloc abstraction

2014-01-27 Thread Ilya Dryomov
at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them. This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c

Re: [PATCH 07/11] libceph: add ceph_pg_pool_by_id()

2014-01-27 Thread Ilya Dryomov
On Mon, Jan 27, 2014 at 6:38 PM, Sage Weil s...@inktank.com wrote: Would it make more sense to just rename and export the existing function? I'm not sure __ is particularly meaningful in the context of osdmap.c... I added a new one because __lookup_pg_pool() takes rb_root, whereas all existing

Re: [PATCH 10/11] libceph: follow redirect replies from osds

2014-01-27 Thread Ilya Dryomov
On Mon, Jan 27, 2014 at 8:32 PM, Sage Weil s...@inktank.com wrote: On Mon, 27 Jan 2014, Ilya Dryomov wrote: Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034. v1 (current) version of redirect reply consists of oloc and oid, which

[PATCH] ceph: fix dout() compile warnings in ceph_filemap_fault()

2014-01-28 Thread Ilya Dryomov
PAGE_CACHE_SIZE is unsigned long on all architectures, however size_t is either unsigned int or unsigned long. Rather than change format strings, cast PAGE_CACHE_SIZE to size_t to be in line with dout()s in ceph_page_mkwrite(). Cc: Yan, Zheng zheng.z@intel.com Signed-off-by: Ilya Dryomov

[PATCH] ceph: cast PAGE_SIZE to size_t in ceph_sync_write()

2014-01-28 Thread Ilya Dryomov
Use min_t(size_t, ...) instead of plain min(), which does strict type checking, to avoid compile warning on i386. Cc: Jianpeng Ma majianp...@gmail.com Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- fs/ceph/file.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs

Re: [GIT PULL] Ceph updates for -rc1

2014-01-29 Thread Ilya Dryomov
On Wed, Jan 29, 2014 at 4:30 PM, Sage Weil s...@inktank.com wrote: On Tue, 28 Jan 2014, Sage Weil wrote: Hi Linus, On Tue, 28 Jan 2014, Linus Torvalds wrote: On Tue, Jan 28, 2014 at 1:10 PM, Dave Jones da...@redhat.com wrote: This breaks the build for me. It is my merge

[PATCH v2] ceph: fix posix ACL hooks

2014-01-29 Thread Ilya Dryomov
From: Sage Weil s...@inktank.com The merge of 7221fe4c2 raced with upstream changes in the generic POSIX ACL code (2aeccbe95). Update Ceph to use the new helpers as well by dropping the now-generic functions and setting the set_acl inode op. Signed-off-by: Sage Weil s...@inktank.com --- v2:

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-01-29 Thread Ilya Dryomov
On Wed, Jan 29, 2014 at 6:37 PM, Ilya Dryomov ilya.dryo...@inktank.com wrote: From: Sage Weil s...@inktank.com The merge of 7221fe4c2 raced with upstream changes in the generic POSIX ACL code (2aeccbe95). Update Ceph to use the new helpers as well by dropping the now-generic functions

[PATCH 3/4] libceph: factor out logic from ceph_osdc_start_request()

2014-02-03 Thread Ilya Dryomov
Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osd_client.c | 62

[PATCH 2/4] libceph: a per-osdc crush scratch buffer

2014-02-03 Thread Ilya Dryomov
. This shouldn't result in any contention, because all osd requests were already serialized by request_mutex at that point; the only unlocked caller was ceph_ioctl_get_dataloc(). Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osdmap.h |3 +++ net/ceph/osdmap.c

[PATCH 0/4] Misc patches for 3.14-rc2

2014-02-03 Thread Ilya Dryomov
Hello, - 1/4 is a simple error path fix - 2/4 replaces VLA with a heap buffer - 3/4 and 4/4 eliminate a possible race condition in redirect reply handling I think these can go into rc2 after some testing. Thanks, Ilya Ilya Dryomov (4): libceph: fix error handling

[PATCH 4/4] libceph: take map_sem for read in handle_reply()

2014-02-03 Thread Ilya Dryomov
, crush_mutex.) Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osd_client.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 2aa82b6bb305..0676f2b199d6 100644 --- a/net/ceph/osd_client.c

[PATCH 1/4] libceph: fix error handling in ceph_osdc_init()

2014-02-03 Thread Ilya Dryomov
msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/osd_client.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index

Re: [PATCH 2/4] libceph: a per-osdc crush scratch buffer

2014-02-03 Thread Ilya Dryomov
On Mon, Feb 3, 2014 at 6:27 PM, Sage Weil s...@inktank.com wrote: On Mon, 3 Feb 2014, Ilya Dryomov wrote: With the addition of erasure coding support in the future, scratch variable-length array in crush_do_rule_ary() is going to grow to at least 200 bytes on average, on top of another 128

[PATCH] libceph: do not dereference a NULL bio pointer

2014-02-05 Thread Ilya Dryomov
Commit f38a5181d9f3 (ceph: Convert to immutable biovecs) introduced a NULL pointer dereference, which broke rbd in -rc1. Fix it. Cc: Kent Overstreet k...@daterainc.com Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/messenger.c |8 ++-- 1 file changed, 6 insertions

Re: rbd: use watch/notify for changes in rbd header

2014-02-13 Thread Ilya Dryomov
On Thu, Feb 13, 2014 at 11:50 AM, Dan Carpenter dan.carpen...@oracle.com wrote: Hello Yehuda Sadeh, The patch 59c2be1e4d42: rbd: use watch/notify for changes in rbd header from Mar 21, 2011, leads to the following static checker warning: drivers/block/rbd.c:687

[PATCH 3/6] libceph: bump CEPH_OSD_MAX_OP to 3

2014-02-21 Thread Ilya Dryomov
Our longest osd request now contains 3 ops: copyup+hint+write. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osd_client.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index

[PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-21 Thread Ilya Dryomov
In an effort to reduce fragmentation, prefix every rbd write with a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set to the object size (1 order). Backwards compatibility is taken care of on the libceph/osd side. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com

[PATCH 4/6] rbd: do not hard-code CEPH_OSD_MAX_OP in rbd_osd_req_callback()

2014-02-21 Thread Ilya Dryomov
CEPH_OSD_MAX_OP value in rbd_osd_req_callback() is hard-coded to 2. Fix it. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index b365e0dfccb6

[PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-21 Thread Ilya Dryomov
workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit. SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: Ilya Dryomov

[PATCH 0/6] libceph: CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-21 Thread Ilya Dryomov
Hello, This series adds support for CEPH_OSD_OP_SETALLOCHINT osd op to libceph along with adjusting rbd to make use of it. The rationale and the basic desing was outlined in the rados io hints thread on ceph-devel about a month ago. Thanks, Ilya Ilya Dryomov (6): libceph

[PATCH 1/6] libceph: encode CEPH_OSD_OP_FLAG_* op flags

2014-02-21 Thread Ilya Dryomov
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/osd_client.h |1 + include/linux/ceph/rados.h |2 +- net/ceph/osd_client.c |2 ++ 3 files changed, 4 insertions(+), 1

[PATCH 5/6] rbd: num_ops parameter for rbd_osd_req_create()

2014-02-21 Thread Ilya Dryomov
In preparation for prefixing rbd writes with an allocation hint introduce a num_ops parameter for rbd_osd_req_create(). The rationale is that not every write request is a write op that needs to be prefixed (e.g. watch op), so the num_ops logic needs to be in the callers. Signed-off-by: Ilya

Re: [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: This is primarily for rbd's benefit and is supposed to combat fragmentation: ... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs

Re: [PATCH 4/6] rbd: do not hard-code CEPH_OSD_MAX_OP in rbd_osd_req_callback()

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: CEPH_OSD_MAX_OP value in rbd_osd_req_callback() is hard-coded to 2. Fix it. Please squash this in with the previous patch (at least). Change the BUG_ON() to rbd_assert() while you're

Re: [PATCH 0/6] libceph: CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:58 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: Hello, This series adds support for CEPH_OSD_OP_SETALLOCHINT osd op to libceph along with adjusting rbd to make use of it. The rationale and the basic desing was outlined

Re: [PATCH 6/6] rbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Mon, Feb 24, 2014 at 4:59 PM, Alex Elder el...@ieee.org wrote: On 02/21/2014 12:55 PM, Ilya Dryomov wrote: In an effort to reduce fragmentation, prefix every rbd write with a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set to the object size (1 order). Backwards

Re: [PATCH 2/6] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
On Tue, Feb 25, 2014 at 3:05 PM, Alex Elder el...@ieee.org wrote: The other thing is that the expected size is limited by rbd_image_header-obj_order, which is a single byte. I think you should encode this the same way. Even if the hint were for more than RBD, this level of granularity may

[PATCH v2 2/5] libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit. SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: Ilya Dryomov

[PATCH v2 4/5] rbd: num_ops parameter for rbd_osd_req_create()

2014-02-25 Thread Ilya Dryomov
In preparation for prefixing rbd writes with an allocation hint introduce a num_ops parameter for rbd_osd_req_create(). The rationale is that not every write request is a write op that needs to be prefixed (e.g. watch op), so the num_ops logic needs to be in the callers. Signed-off-by: Ilya

[PATCH v2 0/5] libceph: CEPH_OSD_OP_SETALLOCHINT osd op

2014-02-25 Thread Ilya Dryomov
. This posting is simply to reflect the state of the series after a thorough review by Alex. Thanks, Ilya Ilya Dryomov (5): libceph: encode CEPH_OSD_OP_FLAG_* op flags libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op libceph: bump CEPH_OSD_MAX_OP to 3 rbd: num_ops

[PATCH v2 1/5] libceph: encode CEPH_OSD_OP_FLAG_* op flags

2014-02-25 Thread Ilya Dryomov
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com Reviewed-by: Sage Weil s...@inktank.com Reviewed-by: Alex Elder el...@linaro.org --- include/linux/ceph/osd_client.h |1 + include/linux/ceph/rados.h |2 +- net

[PATCH] rbd: fix error paths in rbd_img_request_fill()

2014-03-03 Thread Ilya Dryomov
://tracker.ceph.com/issues/7327 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- drivers/block/rbd.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index b365e0dfccb6..53d492e83586 100644 --- a/drivers/block/rbd.c +++ b/drivers

Re: [PATCH] rbd: fix error paths in rbd_img_request_fill()

2014-03-04 Thread Ilya Dryomov
On Mon, Mar 3, 2014 at 11:59 PM, Alex Elder el...@ieee.org wrote: On 03/03/2014 09:38 AM, Ilya Dryomov wrote: Doing rbd_obj_request_put() in rbd_img_request_fill() error paths is not only insufficient, but also triggers an rbd_assert() in rbd_obj_request_destroy(): Assertion failure

Re: CMake blueprint

2014-03-05 Thread Ilya Dryomov
On Wed, Mar 5, 2014 at 6:04 PM, Casey Bodley ca...@linuxbox.com wrote: Hi Ilya, Regarding the CMake blueprint at http://wiki.ceph.com/Planning/Blueprints/Giant/CMake, we at The Linux Box are excited to see more interest! I know that we've made several improvements to the CMakeLists on

[PATCH 3/5] crush: add chooseleaf_vary_r tunable

2014-03-19 Thread Ilya Dryomov
was seeing PGs stuck in active+remapped after reweight-by-utilization because the up set mapped to a single OSD. Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/crush/crush.h |6 ++ net/ceph/crush

[PATCH 1/5] crush: fix off-by-one errors in total_tries refactor

2014-03-19 Thread Ilya Dryomov
commit 795704fd615f0b008dcc81aa088a859b2d075138. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- net/ceph/crush/mapper.c | 46 +++--- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c

[PATCH 5/5] crush: support chooseleaf_vary_r tunable (tunables3) by default

2014-03-19 Thread Ilya Dryomov
Add TUNABLES3 feature (chooseleaf_vary_r tunable) to a set of features supported by default. Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com --- include/linux/ceph/ceph_features.h | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/ceph

  1   2   3   4   5   6   >