Hi ceph-users,

I am trying to figure out how to go about making ceph balancer do its magic, as 
I have some pretty unbalanced distribution across osd’s currently, both SSD and 
HDD.

Cluster is 12.2.4 on Ubuntu 16.04.
All OSD’s have been migrated to bluestore.

Specifically, my HDD’s are the main driver of trying to run the balancer, as I 
have a near full HDD.

> ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
>  4   hdd 7.28450  1.00000 7459G 4543G  2916G 60.91 0.91 126
> 21   hdd 7.28450  1.00000 7459G 4626G  2833G 62.02 0.92 130
>  0   hdd 7.28450  1.00000 7459G 4869G  2589G 65.28 0.97 133
>  5   hdd 7.28450  1.00000 7459G 4866G  2592G 65.24 0.97 136
> 14   hdd 7.28450  1.00000 7459G 4829G  2629G 64.75 0.96 138
>  8   hdd 7.28450  1.00000 7459G 4829G  2629G 64.75 0.96 139
>  7   hdd 7.28450  1.00000 7459G 4959G  2499G 66.49 0.99 141
> 23   hdd 7.28450  1.00000 7459G 5159G  2299G 69.17 1.03 142
>  2   hdd 7.28450  1.00000 7459G 5042G  2416G 67.60 1.01 144
>  1   hdd 7.28450  1.00000 7459G 5292G  2167G 70.95 1.06 145
> 10   hdd 7.28450  1.00000 7459G 5441G  2018G 72.94 1.09 146
> 19   hdd 7.28450  1.00000 7459G 5125G  2333G 68.72 1.02 146
>  9   hdd 7.28450  1.00000 7459G 5123G  2335G 68.69 1.02 146
> 18   hdd 7.28450  1.00000 7459G 5187G  2271G 69.54 1.04 149
> 22   hdd 7.28450  1.00000 7459G 5369G  2089G 71.98 1.07 150
> 12   hdd 7.28450  1.00000 7459G 5375G  2083G 72.07 1.07 152
> 17   hdd 7.28450  1.00000 7459G 5498G  1961G 73.71 1.10 152
> 11   hdd 7.28450  1.00000 7459G 5621G  1838G 75.36 1.12 154
> 15   hdd 7.28450  1.00000 7459G 5576G  1882G 74.76 1.11 154
> 20   hdd 7.28450  1.00000 7459G 5797G  1661G 77.72 1.16 158
>  6   hdd 7.28450  1.00000 7459G 5951G  1508G 79.78 1.19 164
>  3   hdd 7.28450  1.00000 7459G 5960G  1499G 79.90 1.19 166
> 16   hdd 7.28450  1.00000 7459G 6161G  1297G 82.60 1.23 169
> 13   hdd 7.28450  1.00000 7459G 6678G   780G 89.54 1.33 184

I sorted this on PGS, and you can see that PGs pretty well follow actual disk 
usage, and since balancer appears to attempt to distribute PGs more perfectly, 
I should get more even distribution of my usage.
Hopefully that passes the sanity check.

> ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
> 49   ssd 1.76109  1.00000 1803G  882G   920G 48.96 0.73 205
> 72   ssd 1.76109  1.00000 1803G  926G   876G 51.38 0.77 217
> 30   ssd 1.76109  1.00000 1803G  950G   852G 52.73 0.79 222
> 48   ssd 1.76109  1.00000 1803G  961G   842G 53.29 0.79 225
> 54   ssd 1.76109  1.00000 1803G  980G   823G 54.36 0.81 230
> 63   ssd 1.76109  1.00000 1803G  985G   818G 54.62 0.81 230
> 35   ssd 1.76109  1.00000 1803G  997G   806G 55.30 0.82 233
> 45   ssd 1.76109  1.00000 1803G 1002G   801G 55.58 0.83 234
> 67   ssd 1.76109  1.00000 1803G 1004G   799G 55.69 0.83 234
> 42   ssd 1.76109  1.00000 1803G 1006G   796G 55.84 0.83 235
> 52   ssd 1.76109  1.00000 1803G 1009G   793G 56.00 0.83 238
> 61   ssd 1.76109  1.00000 1803G 1014G   789G 56.24 0.84 238
> 68   ssd 1.76109  1.00000 1803G 1021G   782G 56.62 0.84 238
> 32   ssd 1.76109  1.00000 1803G 1021G   781G 56.67 0.84 240
> 65   ssd 1.76109  1.00000 1803G 1024G   778G 56.83 0.85 240
> 26   ssd 1.76109  1.00000 1803G 1022G   780G 56.72 0.84 241
> 59   ssd 1.76109  1.00000 1803G 1031G   771G 57.20 0.85 241
> 47   ssd 1.76109  1.00000 1803G 1035G   767G 57.42 0.86 242
> 37   ssd 1.76109  1.00000 1803G 1036G   767G 57.46 0.86 243
> 28   ssd 1.76109  1.00000 1803G 1043G   760G 57.85 0.86 245
> 40   ssd 1.76109  1.00000 1803G 1047G   755G 58.10 0.87 245
> 41   ssd 1.76109  1.00000 1803G 1046G   756G 58.06 0.86 245
> 62   ssd 1.76109  1.00000 1803G 1050G   752G 58.25 0.87 245
> 39   ssd 1.76109  1.00000 1803G 1051G   751G 58.30 0.87 246
> 56   ssd 1.76109  1.00000 1803G 1050G   752G 58.27 0.87 246
> 70   ssd 1.76109  1.00000 1803G 1041G   761G 57.75 0.86 246
> 73   ssd 1.76109  1.00000 1803G 1057G   746G 58.63 0.87 247
> 44   ssd 1.76109  1.00000 1803G 1056G   746G 58.58 0.87 248
> 38   ssd 1.76109  1.00000 1803G 1059G   743G 58.75 0.87 249
> 51   ssd 1.76109  1.00000 1803G 1063G   739G 58.99 0.88 249
> 33   ssd 1.76109  1.00000 1803G 1067G   736G 59.18 0.88 250
> 36   ssd 1.76109  1.00000 1803G 1071G   731G 59.41 0.88 251
> 55   ssd 1.76109  1.00000 1803G 1066G   737G 59.11 0.88 251
> 27   ssd 1.76109  1.00000 1803G 1078G   724G 59.81 0.89 252
> 31   ssd 1.76109  1.00000 1803G 1079G   724G 59.84 0.89 252
> 69   ssd 1.76109  1.00000 1803G 1075G   727G 59.63 0.89 252
> 46   ssd 1.76109  1.00000 1803G 1082G   721G 60.00 0.89 253
> 58   ssd 1.76109  1.00000 1803G 1081G   721G 59.98 0.89 253
> 66   ssd 1.76109  1.00000 1803G 1081G   722G 59.96 0.89 253
> 34   ssd 1.76109  1.00000 1803G 1091G   712G 60.52 0.90 255
> 43   ssd 1.76109  1.00000 1803G 1089G   713G 60.42 0.90 256
> 64   ssd 1.76109  1.00000 1803G 1097G   705G 60.87 0.91 257
> 24   ssd 1.76109  1.00000 1803G 1113G   690G 61.72 0.92 260
> 25   ssd 1.76109  1.00000 1803G 1146G   656G 63.58 0.95 269
> 29   ssd 1.76109  1.00000 1803G 1146G   656G 63.59 0.95 269
> 71   ssd 1.76109  1.00000 1803G 1151G   651G 63.88 0.95 269
> 57   ssd 1.76109  1.00000 1803G 1183G   619G 65.63 0.98 278
> 60   ssd 1.76109  1.00000 1803G 1183G   620G 65.60 0.98 278
> 53   ssd 1.76109  1.00000 1803G 1220G   583G 67.67 1.01 286
> 50   ssd 1.76109  1.00000 1803G 1283G   519G 71.19 1.06 303

The SSD’s are roughly the same in that PG distribution is matching usage, so I 
don’t expect a bunch of empty PG’s or anything like that.

So looking at the balancer, I tried to create a plan, and execute the plan, 
however nothing appears to be happening.
I’m assuming I would expect to see backfills take place when it starts 
re-balancing the PGs (and thus data).

> $ ceph balancer eval
> current cluster score 0.024025 (lower is better)


> $ ceph balancer optimize 180412.plan1


> $ ceph balancer status
> {
>     "active": true,
>     "plans": [
>         "180412.plan1"
>     ],
>     "mode": "crush-compat"
> }


> $ ceph balancer eval 180412.plan1
> plan 180412.plan1 final score 0.024025 (lower is better)

> $ ceph balancer show 180412.plan1
> # starting osdmap epoch 89751
> # starting crush version 250
> # mode crush-combat

So maybe I’m not giving it specific parameters?

Here is a pastebin dump of ceph balancer dump $plan: 
https://pastebin.com/S6JwtY5Q <https://pastebin.com/S6JwtY5Q>

In another ML thread I found someone with more showing for the balancer configs 
compared to what I have here:
> $ ceph config-key dump
> {
>     "mgr/balancer/active": "1",
>     "mgr/balancer/mode": "crush-compat",
>     "mgr/influx/hostname": "",
>     "mgr/influx/password": "",
>     "mgr/influx/username": ""
> }

This compared to what someone else had posted in the other thread:
> ceph config-key dump
> {
>    "mgr/balancer/active": "1",
>    "mgr/balancer/begin_time": "0830",
>    "mgr/balancer/end_time": "1600",
>    "mgr/balancer/max_misplaced": "0.01",
>    "mgr/balancer/mode": "crush-compat”
> }

So I figure there is some less than perfectly documented step that I am 
missing, and it is not in fact “turn on and forget it” as Sage mentioned in his 
presentation, at least in the current form.

Appreciate the help,

Reed

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to