Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
'ceph osd crush tunables optimal' or adjust an offline map file via the crushtool command line (more annoying) and retest; I suspect that is the problem. http://ceph.com/docs/master/rados/operations/crush-map/#tunables That solves the bug with weight 0, thanks. But is still get the

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Sage Weil
On Mon, 6 Jan 2014, Dietmar Maurer wrote: 'ceph osd crush tunables optimal' or adjust an offline map file via the crushtool command line (more annoying) and retest; I suspect that is the problem. http://ceph.com/docs/master/rados/operations/crush-map/#tunables That solves the bug

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
Host with only one osd gets too much data. I think this is just fundamentally a problem with distributing 3 replicas over only 4 hosts. Every piece of data in the system needs to include either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas (on separate hosts). Add

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
I think this is just fundamentally a problem with distributing 3 replicas over only 4 hosts. Every piece of data in the system needs to include either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas (on separate hosts). Add more hosts or disks and the distribution will

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-03 Thread Sage Weil
Run 'ceph osd crush tunables optimal' or adjust an offline map file via the crushtool command line (more annoying) and retest; I suspect that is the problem. http://ceph.com/docs/master/rados/operations/crush-map/#tunables sage On Fri, 3 Jan 2014, Dietmar Maurer wrote: In both cases, you

[ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
I try to understand the default crush rule: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } Is this the same as: rule data { ruleset 0 type

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Wido den Hollander
On 01/02/2014 10:40 AM, Dietmar Maurer wrote: I try to understand the default crush rule: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } Is this the same

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Sage Weil
On Thu, 2 Jan 2014, Wido den Hollander wrote: On 01/02/2014 10:40 AM, Dietmar Maurer wrote: I try to understand the default crush rule: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Sage Weil
On Thu, 2 Jan 2014, Dietmar Maurer wrote: iirc, chooseleaf goes down the tree and descents into multiple leafs to find what you are looking for. choose goes into that leaf and tries to find what you are looking for without going into subtrees. Right. To a first approximation,

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
iirc, chooseleaf goes down the tree and descents into multiple leafs to find what you are looking for. choose goes into that leaf and tries to find what you are looking for without going into subtrees. Right. To a first approximation, these rules are equivalent. The difference is

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
The other difference is if you have one of the two OSDs on the host marked out. In the choose case, the remaining OSD will get allocated 2x the data; in the chooseleaf case, usage will remain proportional with the rest of the cluster and the data from the out OSD will be distributed across

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Sage Weil
--test IIRC. Yep! sage On Thu, 2 Jan 2014, Dietmar Maurer wrote: The other difference is if you have one of the two OSDs on the host marked out. In the choose case, the remaining OSD will get allocated 2x the data; in the chooseleaf case, usage will remain proportional with the rest

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
In both cases, you only get 2 replicas on the remaining 2 hosts. OK, I was able to reproduce this with crushtool. The difference is if you have 4 hosts with 2 osds. In the choose case, you have some fraction of the data that chose the down host in the first step (most of the attempts,

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
I also don't really understand why crush selects OSDs with weight=0 host prox-ceph-3 { id -4 # do not change unnecessarily # weight 3.630 alg straw hash 0 # rjenkins1 item osd.4 weight 0 } root default { id -1 # do not