Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2019-01-02 Thread Thomas Byrne - UKRI STFC
Assuming I understand it correctly:

"pg_upmap_items 6.0 [40,20]" refers to replacing (upmapping?) osd.40 with 
osd.20 in the acting set of the placement group '6.0'. Assuming it's a 3 
replica PG, the other two OSDs in the set remain unchanged from the CRUSH 
calculation.

"pg_upmap_items 6.6 [45,46,59,56]" describes two upmap replacements for the PG 
6.6, replacing 45 with 46, and 59 with 56.

Hope that helps.

Cheers,
Tom

> -Original Message-
> From: ceph-users  On Behalf Of
> jes...@krogh.cc
> Sent: 30 December 2018 22:04
> To: Konstantin Shalygin 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Balancing cluster with large disks - 10TB HHD
> 
> >> I would still like to have a log somewhere to grep and inspect what
> >> balancer/upmap actually does - when in my cluster. Or some ceph
> >> commands that deliveres some monitoring capabilityes .. any
> >> suggestions?
> > Yes, on ceph-mgr log, when log level is DEBUG.
> 
> Tried the docs .. something like:
> 
> ceph tell mds ... does not seem to work.
> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/
> 
> > You can get your cluster upmap's in via `ceph osd dump | grep upmap`.
> 
> Got it -- but I really need the README .. it shows the map ..
> ...
> pg_upmap_items 6.0 [40,20]
> pg_upmap_items 6.1 [59,57,47,48]
> pg_upmap_items 6.2 [59,55,75,9]
> pg_upmap_items 6.3 [22,13,40,39]
> pg_upmap_items 6.4 [23,9]
> pg_upmap_items 6.5 [25,17]
> pg_upmap_items 6.6 [45,46,59,56]
> pg_upmap_items 6.8 [60,54,16,68]
> pg_upmap_items 6.9 [61,69]
> pg_upmap_items 6.a [51,48]
> pg_upmap_items 6.b [43,71,41,29]
> pg_upmap_items 6.c [22,13]
> 
> ..
> 
> But .. I dont have any pg's that should only have 2 replicas.. neither any 
> with 4
> .. how should this be interpreted?
> 
> Thanks.
> 
> --
> Jesper
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-30 Thread jesper
>> I would still like to have a log somewhere to grep and inspect what
>> balancer/upmap
>> actually does - when in my cluster. Or some ceph commands that deliveres
>> some monitoring capabilityes .. any suggestions?
> Yes, on ceph-mgr log, when log level is DEBUG.

Tried the docs .. something like:

ceph tell mds ... does not seem to work.
http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/

> You can get your cluster upmap's in via `ceph osd dump | grep upmap`.

Got it -- but I really need the README .. it shows the map ..
...
pg_upmap_items 6.0 [40,20]
pg_upmap_items 6.1 [59,57,47,48]
pg_upmap_items 6.2 [59,55,75,9]
pg_upmap_items 6.3 [22,13,40,39]
pg_upmap_items 6.4 [23,9]
pg_upmap_items 6.5 [25,17]
pg_upmap_items 6.6 [45,46,59,56]
pg_upmap_items 6.8 [60,54,16,68]
pg_upmap_items 6.9 [61,69]
pg_upmap_items 6.a [51,48]
pg_upmap_items 6.b [43,71,41,29]
pg_upmap_items 6.c [22,13]

..

But .. I dont have any pg's that should only have 2 replicas.. neither any
with 4 .. how should this be interpreted?

Thanks.

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-30 Thread Konstantin Shalygin

On 12/30/18 6:48 PM, Marc Roos wrote:

You mean the values in the reweight column or the weight column? Because
from the commands in this thread I am assuming the weight column. Does
this mean that the upmap is handling disk sizes automatically?


Reweight, not weight. Weight is a weight of bucket. Reweight is "I have 
unbalanced buckets, so I need local adjust".


Upmap is not about disk size, please consult with this PDF from Dan [1]


Currently
I am using the balancer (turned off) in crush-compat mode and have a few
8TB disks mixed with 4TB disks.

upmap balancing mode doesn't balances by size, is balanced by PG.



k

[1] 
https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-30 Thread Marc Roos
>> 4. Revert all your reweights.
>
>Done

You mean the values in the reweight column or the weight column? Because 
from the commands in this thread I am assuming the weight column. Does 
this mean that the upmap is handling disk sizes automatically? Currently 
I am using the balancer (turned off) in crush-compat mode and have a few 
8TB disks mixed with 4TB disks.

ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-1   120.64897 root default
-230.48000 host c01
 0   hdd   8.0 osd.0  up  1.0 1.0
 3   hdd   3.0 osd.3  up  0.86809 1.0
 6   hdd   4.0 osd.6  up  1.0 1.0
 7   hdd   3.0 osd.7  up  0.86809 1.0
 8   hdd   4.0 osd.8  up  1.0 1.0

#sets the weight (mostly disk size, first column):
ceph osd crush reweight osd.6 4

#set the reweight (2nd column)
ceph osd reweight osd.6 1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-29 Thread Konstantin Shalygin

I would still like to have a log somewhere to grep and inspect what
balancer/upmap
actually does - when in my cluster. Or some ceph commands that deliveres
some monitoring capabilityes .. any suggestions?

Yes, on ceph-mgr log, when log level is DEBUG.

You can get your cluster upmap's in via `ceph osd dump | grep upmap`.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-28 Thread jesper


Hi. .. Just an update - This looks awesome.. and in a 8x5 company -
christmas is a good period to rebalance a cluster :-)

>> I'll try it out again - last I tried it complanied about older clients -
>> it should be better now.
> upmap is supported since kernel 4.13.
>
>> Second - should the reweights be set back to 1 then?
> Yes, also:
>
> 1. `ceph osd crush tunables optimal`

Done

> 2. All your buckets should be straw2, but in case `ceph osd crush
> set-all-straw-buckets-to-straw2`

Done

> 3. Your hosts imbalanced: elefant & capone have only eight 10TB's,
> another hosts - 12. So I recommend replace 8TB's spinners to 10TB or
> just shuffle it between hosts, like 2x8TB+10x10Tb.

Yes, we initially thought we could go with 3 osd-hosts .. but then found
out that EC-pools required more -- and then added.

> 4. Revert all your reweights.

Done

> 5. Balancer do his work: `ceph balancer mode upmap`, `ceph balancer on`.

So far - works awesome --
 sudo qms/server_documentation/ceph/ceph-osd-data-distribution hdd
hdd
x 
N   Min   MaxMedian   AvgStddev
x  72 50.82 55.65 52.88 52.916944 1.0002586

As compared to the best I got with reweighting:
$ sudo qms/server_documentation/ceph/ceph-osd-data-distribution hdd
hdd
x 
N   Min   MaxMedian   AvgStddev
x  72 45.36 54.98 52.63 52.131944 2.0746672


It took about 24 hours to rebalance -- and move quite some TB's around.

I would still like to have a log somewhere to grep and inspect what
balancer/upmap
actually does - when in my cluster. Or some ceph commands that deliveres
some monitoring capabilityes .. any suggestions?

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread Konstantin Shalygin

I'll try it out again - last I tried it complanied about older clients -
it should be better now.

upmap is supported since kernel 4.13.


Second - should the reweights be set back to 1 then?

Yes, also:

1. `ceph osd crush tunables optimal`

2. All your buckets should be straw2, but in case `ceph osd crush 
set-all-straw-buckets-to-straw2`


3. Your hosts imbalanced: elefant & capone have only eight 10TB's, 
another hosts - 12. So I recommend replace 8TB's spinners to 10TB or 
just shuffle it between hosts, like 2x8TB+10x10Tb.


4. Revert all your reweights.

5. Balancer do his work: `ceph balancer mode upmap`, `ceph balancer on`.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread jesper
> Have a look at this thread on the mailing list:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46506.html

Ok, done..  how do I see that it actually work?
Second - should the reweights be set back to 1 then?

Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread Heðin Ejdesgaard Møller
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On mik, 2018-12-26 at 16:30 +0100, jes...@krogh.cc wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> > 
> > On mik, 2018-12-26 at 13:14 +0100, jes...@krogh.cc wrote:
> > > Thanks for the insight and links.
> > > 
> > > > As I can see you are on Luminous. Since Luminous Balancer plugin is
> > > > available [1], you should use it instead reweight's in place,
> > > 
> > > especially
> > > > in upmap mode [2]
> > > 
> > > I'll try it out again - last I tried it complanied about older clients -
> > > it should be better now.
> > > 
> > 
> > require_min_compat_client luminous is required, for you to take advantage
> > of
> > upmap.
> 
> $ sudo ceph osd set-require-min-compat-client luminous
> Error EPERM: cannot set require_min_compat_client to luminous: 54
> connected client(s) look like jewel (missing 0x800); add
> --yes-i-really-mean-it to do it anyway
> 
> We've standardize on the 4.15 kernel client on all CephFS clients, those
> are the 54 - would it be safe to ignore above warning ? Otherwise - which
> kernel do I need to go to ?
> 
> 
Have a look at this thread on the mailing list:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46506.html
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEElZWfRQVsNukQFi9Ko80MCbT/An0FAlwjwgIACgkQo80MCbT/
An2sIQ/9GvHML60e+kA6zA+wdI0wxRt+EyUCKGbnAiILtgTHfkd/MWF4MOLuiNUw
QGLMjh+vXdBf8wS+tFgl6crwXkMrQUleN3caDB0unC0R/xXUEJcP8cjBcBiG96cp
lhAiTEDPWbWflupy/AhaFrVDWZWiIL9KEmv0KjETke08ddnFfZRPudrO31mSYR/k
xCTP9ui30aLkHpZe8KwP2QwbJJc1C/ZqsrNwSDFmhYO2x6xIaqDLM7TzYS5sDPQj
eAe05Oes5OttlijigGZueKvvA+gnMuuGxOTLpazhh1Zo/Vpx48IFlQarYTfsZYZw
MUzQESlYxT7zuCk1ikXHXJ5JbLHr2Ar/uA/G2Q6Uk1RPsgyfAM5Yt/5oCxAPeQP2
m5AVS82evP00897fDy2uM+/h0d4tyOJ73iSCUoxqGTP8O0QxArkpCJCS4xrWEfCP
7OBfdQQ6jynvEx3n2j1PsOZTsumIBv9t17mKWXcX9X+iZuD1zq58qmg3wvD9fZbf
JaySSWcGR1yPmD0A3CKkn7YTnxdwGu34kWPnAS3XTinepFXefUrKHpOFsmCteGaD
YSd2Sx9fqLXdsIyFHNEjitt0V+dO8HMSJ+xlrbHd2ZzbVqbeg4s7lRVeNBn5URbx
pmU9Czh/f1Sbtn+B/B0d8rushENQNMeOmFUwgGlF0lF7uAfwQD8=
=HTc4
-END PGP SIGNATURE-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread jesper
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> On mik, 2018-12-26 at 13:14 +0100, jes...@krogh.cc wrote:
>> Thanks for the insight and links.
>>
>> > As I can see you are on Luminous. Since Luminous Balancer plugin is
>> > available [1], you should use it instead reweight's in place,
>> especially
>> > in upmap mode [2]
>>
>> I'll try it out again - last I tried it complanied about older clients -
>> it should be better now.
>>
> require_min_compat_client luminous is required, for you to take advantage
> of
> upmap.

$ sudo ceph osd set-require-min-compat-client luminous
Error EPERM: cannot set require_min_compat_client to luminous: 54
connected client(s) look like jewel (missing 0x800); add
--yes-i-really-mean-it to do it anyway

We've standardize on the 4.15 kernel client on all CephFS clients, those
are the 54 - would it be safe to ignore above warning ? Otherwise - which
kernel do I need to go to ?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread Heðin Ejdesgaard Møller
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On mik, 2018-12-26 at 13:14 +0100, jes...@krogh.cc wrote:
> Thanks for the insight and links.
> 
> > As I can see you are on Luminous. Since Luminous Balancer plugin is
> > available [1], you should use it instead reweight's in place, especially
> > in upmap mode [2]
> 
> I'll try it out again - last I tried it complanied about older clients -
> it should be better now.
> 
require_min_compat_client luminous is required, for you to take advantage of
upmap.

> > Also, may be I can catch another crush mistakes, can I see `ceph osd
> > crush show-tunables, `ceph osd crush rule dump`, `ceph osd pool ls
> > detail`?
> 
> Here:
> $ sudo ceph osd crush show-tunables
> {
> "choose_local_tries": 0,
> "choose_local_fallback_tries": 0,
> "choose_total_tries": 50,
> "chooseleaf_descend_once": 1,
> "chooseleaf_vary_r": 1,
> "chooseleaf_stable": 0,
> "straw_calc_version": 1,
> "allowed_bucket_algs": 54,
> "profile": "hammer",
> "optimal_tunables": 0,
> "legacy_tunables": 0,
> "minimum_required_version": "hammer",
> "require_feature_tunables": 1,
> "require_feature_tunables2": 1,
> "has_v2_rules": 1,
> "require_feature_tunables3": 1,
> "has_v3_rules": 0,
> "has_v4_buckets": 1,
> "require_feature_tunables5": 0,
> "has_v5_rules": 0
> }
> 
> $ sudo ceph osd crush rule dump
> [
> {
> "rule_id": 0,
> "rule_name": "replicated_ruleset_hdd",
> "ruleset": 0,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -1,
> "item_name": "default~hdd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 1,
> "rule_name": "replicated_ruleset_hdd_fast",
> "ruleset": 1,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -28,
> "item_name": "default~hdd_fast"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 2,
> "rule_name": "replicated_ruleset_ssd",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -21,
> "item_name": "default~ssd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 3,
> "rule_name": "cephfs_data_ec42",
> "ruleset": 3,
> "type": 3,
> "min_size": 3,
> "max_size": 6,
> "steps": [
> {
> "op": "set_chooseleaf_tries",
> "num": 5
> },
> {
> "op": "set_choose_tries",
> "num": 100
> },
> {
> "op": "take",
> "item": -1,
> "item_name": "default~hdd"
> },
> {
> "op": "chooseleaf_indep",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
> ]
> 
> $ sudo ceph osd pool ls detail
> pool 6 'kube' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 128 pgp_num 128 last_change 41045 flags hashpspool
> stripe_width 0 application rbd
> removed_snaps [1~3]
> pool 15 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule
> 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 41045 flags
> hashpspool stripe_width 0 application rgw
> pool 17 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045 lfor 0/36590
> flags hashpspool stripe_width 0 application rgw
> pool 18 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
> crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045
> lfor 0/36595 flags hashpspool stripe_width 0 application rgw
> pool 19 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045 lfor 0/36608
> flags hashpspool stripe_width 0 application rgw
> pool 20 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins 

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-26 Thread jesper


Thanks for the insight and links.

> As I can see you are on Luminous. Since Luminous Balancer plugin is
> available [1], you should use it instead reweight's in place, especially
> in upmap mode [2]

I'll try it out again - last I tried it complanied about older clients -
it should be better now.

> Also, may be I can catch another crush mistakes, can I see `ceph osd
> crush show-tunables, `ceph osd crush rule dump`, `ceph osd pool ls
> detail`?

Here:
$ sudo ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 1,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

$ sudo ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_ruleset_hdd",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "replicated_ruleset_hdd_fast",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -28,
"item_name": "default~hdd_fast"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "replicated_ruleset_ssd",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 3,
"rule_name": "cephfs_data_ec42",
"ruleset": 3,
"type": 3,
"min_size": 3,
"max_size": 6,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]

$ sudo ceph osd pool ls detail
pool 6 'kube' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 128 pgp_num 128 last_change 41045 flags hashpspool
stripe_width 0 application rbd
removed_snaps [1~3]
pool 15 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 41045 flags
hashpspool stripe_width 0 application rgw
pool 17 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045 lfor 0/36590
flags hashpspool stripe_width 0 application rgw
pool 18 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045
lfor 0/36595 flags hashpspool stripe_width 0 application rgw
pool 19 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 16 pgp_num 16 last_change 41045 lfor 0/36608
flags hashpspool stripe_width 0 application rgw
pool 20 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 128 pgp_num 128 last_change 41045 flags hashpspool
stripe_width 0 application rbd
pool 26 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 41045 flags hashpspool
stripe_width 0 application rgw
pool 27 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 41045 flags hashpspool
stripe_width 0 application rgw
pool 28 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-25 Thread Konstantin Shalygin

$ sudo ceph osd df tree
ID  CLASSWEIGHTREWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS TYPE NAME
  -8  639.98883-  639T   327T   312T 51.24 1.00   - root
default
-10  111.73999-  111T 58509G 55915G 51.13 1.00   -
host bison
  78 hdd_fast   0.90900  1.0  930G  1123M   929G  0.12 0.00   0
osd.78
  79 hdd_fast   0.81799  1.0  837G  1123M   836G  0.13 0.00   0
osd.79
  20  hdd   9.09499  0.95000 9313G  4980G  4333G 53.47 1.04 204
osd.20
  28  hdd   9.09499  1.0 9313G  4612G  4700G 49.53 0.97 200
osd.28
  29  hdd   9.09499  1.0 9313G  4848G  4465G 52.05 1.02 211
osd.29
  33  hdd   9.09499  1.0 9313G  4759G  4553G 51.10 1.00 207
osd.33
  34  hdd   9.09499  1.0 9313G  4613G  4699G 49.54 0.97 195
osd.34
  35  hdd   9.09499  0.89250 9313G  4954G  4359G 53.19 1.04 206
osd.35
  36  hdd   9.09499  1.0 9313G  4724G  4588G 50.73 0.99 200
osd.36
  37  hdd   9.09499  1.0 9313G  5013G  4300G 53.83 1.05 214
osd.37
  38  hdd   9.09499  0.92110 9313G  4962G  4350G 53.28 1.04 206
osd.38
  39  hdd   9.09499  1.0 9313G  4960G  4353G 53.26 1.04 214
osd.39
  40  hdd   9.09499  1.0 9313G  5022G  4291G 53.92 1.05 216
osd.40
  41  hdd   9.09499  0.88235 9313G  5037G  4276G 54.09 1.06 203
osd.41
   7  ssd   0.87299  1.0  893G 18906M   875G  2.07 0.04 124
osd.7
  -7  102.74084-  102T 54402G 50805G 51.71 1.01   -
host bonnie
   0  hdd   7.27699  0.87642 7451G  4191G  3259G 56.25 1.10 175
osd.0
   1  hdd   7.27699  0.86200 7451G  3837G  3614G 51.49 1.01 163
osd.1
   2  hdd   7.27699  0.74664 7451G  3920G  3531G 52.61 1.03 169
osd.2
  11  hdd   7.27699  0.77840 7451G  3983G  3467G 53.46 1.04 169
osd.11
  13  hdd   9.09499  0.76595 9313G  4894G  4419G 52.55 1.03 201
osd.13
  14  hdd   9.09499  1.0 9313G  4350G  4963G 46.71 0.91 189
osd.14
  16  hdd   9.09499  0.92635 9313G  4879G  4434G 52.39 1.02 204
osd.16
  18  hdd   9.09499  0.67932 9313G  4634G  4678G 49.76 0.97 190
osd.18
  22  hdd   9.09499  0.93053 9313G  5085G  4228G 54.60 1.07 218
osd.22
  31  hdd   9.09499  0.88536 9313G  5152G  4160G 55.33 1.08 221
osd.31
  42  hdd   9.09499  0.84232 9313G  4796G  4516G 51.51 1.01 199
osd.42
  43  hdd   9.09499  0.87662 9313G  4656G  4657G 50.00 0.98 191
osd.43
   6  ssd   0.87299  1.0  894G 20643M   874G  2.25 0.04 134
osd.6
  -6  102.74100-  102T 53627G 51580G 50.97 0.99   -
host capone
   3  hdd   7.27699  0.84938 7451G  4028G  3422G 54.07 1.06 171
osd.3
   4  hdd   7.27699  0.83890 7451G  3909G  3542G 52.46 1.02 167
osd.4
   5  hdd   7.27699  1.0 7451G  3389G  4061G 45.49 0.89 151
osd.5
   9  hdd   7.27699  1.0 7451G  3710G  3740G 49.80 0.97 161
osd.9
  15  hdd   9.09499  1.0 9313G  4952G  4360G 53.18 1.04 206
osd.15
  17  hdd   9.09499  0.95000 9313G  4865G  4448G 52.24 1.02 202
osd.17
  23  hdd   9.09499  1.0 9313G  4984G  4329G 53.52 1.04 223
osd.23
  24  hdd   9.09499  1.0 9313G  4847G  4466G 52.05 1.02 202
osd.24
  25  hdd   9.09499  0.89929 9313G  4909G  4404G 52.71 1.03 205
osd.25
  30  hdd   9.09499  0.92787 9313G  4740G  4573G 50.90 0.99 202
osd.30
  74  hdd   9.09499  0.93146 9313G  4709G  4603G 50.57 0.99 199
osd.74
  75  hdd   9.09499  1.0 9313G  4559G  4753G 48.96 0.96 194
osd.75
   8  ssd   0.87299  1.0  893G 19593M   874G  2.14 0.04 129
osd.8
-16  102.74100-  102T 53985G 51222G 51.31 1.00   -
host elefant
  19  hdd   7.27699  1.0 7451G  3665G  3786G 49.19 0.96 152
osd.19
  21  hdd   7.27699  0.89539 7451G  4102G  3349G 55.05 1.07 169
osd.21
  64  hdd   7.27699  0.89275 7451G  3956G  3494G 53.10 1.04 171
osd.64
  65  hdd   7.27699  0.92513 7451G  3976G  3475G 53.36 1.04 171
osd.65
  66  hdd   9.09499  1.0 9313G  4674G  4638G 50.20 0.98 199
osd.66
  67  hdd   9.09499  1.0 9313G  4737G  4575G 50.87 0.99 201
osd.67
  68  hdd   9.09499  0.89973 9313G  4946G  4366G 53.11 1.04 211
osd.68
  69  hdd   9.09499  1.0 9313G  4648G  4665G 49.91 0.97 204
osd.69
  70  hdd   9.09499  0.89526 9313G  4907G  4405G 52.69 1.03 209
osd.70
  71  hdd   9.09499  0.84923 9313G  4690G  4622G 50.37 0.98 198
osd.71
  72  hdd   9.09499  0.87547 9313G  4976G  4336G 53.43 1.04 211
osd.72
  73  hdd   9.09499  1.0 9313G  4683G  4630G 50.29 0.98 200
osd.73
  10  ssd   0.87299  1.0  893G 19158M   875G  2.09 0.04 126
osd.10
-14  110.01300-  110T 58498G 54157G 51.93 1.01   -
host flodhest
  27  hdd   9.09499  1.0 9313G  4602G  4710G 49.42 0.96 199
osd.27
  32  hdd   9.09499  0.92557 9313G  5028G  4285G 53.99 1.05 215
osd.32
  54  hdd   9.09499  0.90724 9313G  4897G  4415G 52.59 1.03 203
osd.54
  55  hdd   9.09499  1.0 9313G  4867G  4446G 52.26 1.02 198
osd.55
  56  hdd   9.09499  1.0 9313G  4827G  4485G 51.84 1.01 202
osd.56
  

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-25 Thread jesper
> Please, paste your `ceph osd df tree` and `ceph osd dump | head -n 12`.

$ sudo ceph osd df tree
ID  CLASSWEIGHTREWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS TYPE NAME
 -8  639.98883-  639T   327T   312T 51.24 1.00   - root
default
-10  111.73999-  111T 58509G 55915G 51.13 1.00   -
host bison
 78 hdd_fast   0.90900  1.0  930G  1123M   929G  0.12 0.00   0
osd.78
 79 hdd_fast   0.81799  1.0  837G  1123M   836G  0.13 0.00   0
osd.79
 20  hdd   9.09499  0.95000 9313G  4980G  4333G 53.47 1.04 204
osd.20
 28  hdd   9.09499  1.0 9313G  4612G  4700G 49.53 0.97 200
osd.28
 29  hdd   9.09499  1.0 9313G  4848G  4465G 52.05 1.02 211
osd.29
 33  hdd   9.09499  1.0 9313G  4759G  4553G 51.10 1.00 207
osd.33
 34  hdd   9.09499  1.0 9313G  4613G  4699G 49.54 0.97 195
osd.34
 35  hdd   9.09499  0.89250 9313G  4954G  4359G 53.19 1.04 206
osd.35
 36  hdd   9.09499  1.0 9313G  4724G  4588G 50.73 0.99 200
osd.36
 37  hdd   9.09499  1.0 9313G  5013G  4300G 53.83 1.05 214
osd.37
 38  hdd   9.09499  0.92110 9313G  4962G  4350G 53.28 1.04 206
osd.38
 39  hdd   9.09499  1.0 9313G  4960G  4353G 53.26 1.04 214
osd.39
 40  hdd   9.09499  1.0 9313G  5022G  4291G 53.92 1.05 216
osd.40
 41  hdd   9.09499  0.88235 9313G  5037G  4276G 54.09 1.06 203
osd.41
  7  ssd   0.87299  1.0  893G 18906M   875G  2.07 0.04 124
osd.7
 -7  102.74084-  102T 54402G 50805G 51.71 1.01   -
host bonnie
  0  hdd   7.27699  0.87642 7451G  4191G  3259G 56.25 1.10 175
osd.0
  1  hdd   7.27699  0.86200 7451G  3837G  3614G 51.49 1.01 163
osd.1
  2  hdd   7.27699  0.74664 7451G  3920G  3531G 52.61 1.03 169
osd.2
 11  hdd   7.27699  0.77840 7451G  3983G  3467G 53.46 1.04 169
osd.11
 13  hdd   9.09499  0.76595 9313G  4894G  4419G 52.55 1.03 201
osd.13
 14  hdd   9.09499  1.0 9313G  4350G  4963G 46.71 0.91 189
osd.14
 16  hdd   9.09499  0.92635 9313G  4879G  4434G 52.39 1.02 204
osd.16
 18  hdd   9.09499  0.67932 9313G  4634G  4678G 49.76 0.97 190
osd.18
 22  hdd   9.09499  0.93053 9313G  5085G  4228G 54.60 1.07 218
osd.22
 31  hdd   9.09499  0.88536 9313G  5152G  4160G 55.33 1.08 221
osd.31
 42  hdd   9.09499  0.84232 9313G  4796G  4516G 51.51 1.01 199
osd.42
 43  hdd   9.09499  0.87662 9313G  4656G  4657G 50.00 0.98 191
osd.43
  6  ssd   0.87299  1.0  894G 20643M   874G  2.25 0.04 134
osd.6
 -6  102.74100-  102T 53627G 51580G 50.97 0.99   -
host capone
  3  hdd   7.27699  0.84938 7451G  4028G  3422G 54.07 1.06 171
osd.3
  4  hdd   7.27699  0.83890 7451G  3909G  3542G 52.46 1.02 167
osd.4
  5  hdd   7.27699  1.0 7451G  3389G  4061G 45.49 0.89 151
osd.5
  9  hdd   7.27699  1.0 7451G  3710G  3740G 49.80 0.97 161
osd.9
 15  hdd   9.09499  1.0 9313G  4952G  4360G 53.18 1.04 206
osd.15
 17  hdd   9.09499  0.95000 9313G  4865G  4448G 52.24 1.02 202
osd.17
 23  hdd   9.09499  1.0 9313G  4984G  4329G 53.52 1.04 223
osd.23
 24  hdd   9.09499  1.0 9313G  4847G  4466G 52.05 1.02 202
osd.24
 25  hdd   9.09499  0.89929 9313G  4909G  4404G 52.71 1.03 205
osd.25
 30  hdd   9.09499  0.92787 9313G  4740G  4573G 50.90 0.99 202
osd.30
 74  hdd   9.09499  0.93146 9313G  4709G  4603G 50.57 0.99 199
osd.74
 75  hdd   9.09499  1.0 9313G  4559G  4753G 48.96 0.96 194
osd.75
  8  ssd   0.87299  1.0  893G 19593M   874G  2.14 0.04 129
osd.8
-16  102.74100-  102T 53985G 51222G 51.31 1.00   -
host elefant
 19  hdd   7.27699  1.0 7451G  3665G  3786G 49.19 0.96 152
osd.19
 21  hdd   7.27699  0.89539 7451G  4102G  3349G 55.05 1.07 169
osd.21
 64  hdd   7.27699  0.89275 7451G  3956G  3494G 53.10 1.04 171
osd.64
 65  hdd   7.27699  0.92513 7451G  3976G  3475G 53.36 1.04 171
osd.65
 66  hdd   9.09499  1.0 9313G  4674G  4638G 50.20 0.98 199
osd.66
 67  hdd   9.09499  1.0 9313G  4737G  4575G 50.87 0.99 201
osd.67
 68  hdd   9.09499  0.89973 9313G  4946G  4366G 53.11 1.04 211
osd.68
 69  hdd   9.09499  1.0 9313G  4648G  4665G 49.91 0.97 204
osd.69
 70  hdd   9.09499  0.89526 9313G  4907G  4405G 52.69 1.03 209
osd.70
 71  hdd   9.09499  0.84923 9313G  4690G  4622G 50.37 0.98 198
osd.71
 72  hdd   9.09499  0.87547 9313G  4976G  4336G 53.43 1.04 211
osd.72
 73  hdd   9.09499  1.0 9313G  4683G  4630G 50.29 0.98 200
osd.73
 10  ssd   0.87299  1.0  893G 19158M   875G  2.09 0.04 126

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2018-12-25 Thread Konstantin Shalygin

We hit an OSD_FULL last week on our cluster - with an average utillzation
of less than 50% .. thus hugely imbalanced.  This has driven us to
go for adjusting pg's upwards and reweighting the osd's more agressively.

Question: What do people see as an "acceptable" variance across OSD's?
x 
 N   Min   MaxMedian   AvgStddev
x  72 45.49 56.25 52.35 51.878889 2.1764343

72 x 10TB drives. It seems hard to get further down -- thus churn will
most likely make it hard for us to stay at this level.

Currently we have ~158 PGs / OSD .. which by my math gives 63GB/pg if they
were fully utillzing the disk - which leads me to think that somewhat
smaller pg's would give the balancing an easier job. Would to be ok to
go to closer to 300 PGs/OSD ?  - would it be sane?

I can see that the default max is 300, but I have hard time finding out
if this is "recommendable" or just a "tunable".

* We've now seen OSD_FULL trigger irrecoverable kernel bugs on the
CephFS kernel client on our 4.15 kernels - multiple times - forced reboot
is the only way out. We're on the Ubuntu kernels .. I havent done the diff
to upstream (yet) and I dont intent to run our production cluster
disk-full anyware in the near future to test it out.


Please, paste your `ceph osd df tree` and `ceph osd dump | head -n 12`.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com