Hi ceph-users, I am trying to figure out how to go about making ceph balancer do its magic, as I have some pretty unbalanced distribution across osd’s currently, both SSD and HDD.
Cluster is 12.2.4 on Ubuntu 16.04. All OSD’s have been migrated to bluestore. Specifically, my HDD’s are the main driver of trying to run the balancer, as I have a near full HDD. > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 4 hdd 7.28450 1.00000 7459G 4543G 2916G 60.91 0.91 126 > 21 hdd 7.28450 1.00000 7459G 4626G 2833G 62.02 0.92 130 > 0 hdd 7.28450 1.00000 7459G 4869G 2589G 65.28 0.97 133 > 5 hdd 7.28450 1.00000 7459G 4866G 2592G 65.24 0.97 136 > 14 hdd 7.28450 1.00000 7459G 4829G 2629G 64.75 0.96 138 > 8 hdd 7.28450 1.00000 7459G 4829G 2629G 64.75 0.96 139 > 7 hdd 7.28450 1.00000 7459G 4959G 2499G 66.49 0.99 141 > 23 hdd 7.28450 1.00000 7459G 5159G 2299G 69.17 1.03 142 > 2 hdd 7.28450 1.00000 7459G 5042G 2416G 67.60 1.01 144 > 1 hdd 7.28450 1.00000 7459G 5292G 2167G 70.95 1.06 145 > 10 hdd 7.28450 1.00000 7459G 5441G 2018G 72.94 1.09 146 > 19 hdd 7.28450 1.00000 7459G 5125G 2333G 68.72 1.02 146 > 9 hdd 7.28450 1.00000 7459G 5123G 2335G 68.69 1.02 146 > 18 hdd 7.28450 1.00000 7459G 5187G 2271G 69.54 1.04 149 > 22 hdd 7.28450 1.00000 7459G 5369G 2089G 71.98 1.07 150 > 12 hdd 7.28450 1.00000 7459G 5375G 2083G 72.07 1.07 152 > 17 hdd 7.28450 1.00000 7459G 5498G 1961G 73.71 1.10 152 > 11 hdd 7.28450 1.00000 7459G 5621G 1838G 75.36 1.12 154 > 15 hdd 7.28450 1.00000 7459G 5576G 1882G 74.76 1.11 154 > 20 hdd 7.28450 1.00000 7459G 5797G 1661G 77.72 1.16 158 > 6 hdd 7.28450 1.00000 7459G 5951G 1508G 79.78 1.19 164 > 3 hdd 7.28450 1.00000 7459G 5960G 1499G 79.90 1.19 166 > 16 hdd 7.28450 1.00000 7459G 6161G 1297G 82.60 1.23 169 > 13 hdd 7.28450 1.00000 7459G 6678G 780G 89.54 1.33 184 I sorted this on PGS, and you can see that PGs pretty well follow actual disk usage, and since balancer appears to attempt to distribute PGs more perfectly, I should get more even distribution of my usage. Hopefully that passes the sanity check. > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 49 ssd 1.76109 1.00000 1803G 882G 920G 48.96 0.73 205 > 72 ssd 1.76109 1.00000 1803G 926G 876G 51.38 0.77 217 > 30 ssd 1.76109 1.00000 1803G 950G 852G 52.73 0.79 222 > 48 ssd 1.76109 1.00000 1803G 961G 842G 53.29 0.79 225 > 54 ssd 1.76109 1.00000 1803G 980G 823G 54.36 0.81 230 > 63 ssd 1.76109 1.00000 1803G 985G 818G 54.62 0.81 230 > 35 ssd 1.76109 1.00000 1803G 997G 806G 55.30 0.82 233 > 45 ssd 1.76109 1.00000 1803G 1002G 801G 55.58 0.83 234 > 67 ssd 1.76109 1.00000 1803G 1004G 799G 55.69 0.83 234 > 42 ssd 1.76109 1.00000 1803G 1006G 796G 55.84 0.83 235 > 52 ssd 1.76109 1.00000 1803G 1009G 793G 56.00 0.83 238 > 61 ssd 1.76109 1.00000 1803G 1014G 789G 56.24 0.84 238 > 68 ssd 1.76109 1.00000 1803G 1021G 782G 56.62 0.84 238 > 32 ssd 1.76109 1.00000 1803G 1021G 781G 56.67 0.84 240 > 65 ssd 1.76109 1.00000 1803G 1024G 778G 56.83 0.85 240 > 26 ssd 1.76109 1.00000 1803G 1022G 780G 56.72 0.84 241 > 59 ssd 1.76109 1.00000 1803G 1031G 771G 57.20 0.85 241 > 47 ssd 1.76109 1.00000 1803G 1035G 767G 57.42 0.86 242 > 37 ssd 1.76109 1.00000 1803G 1036G 767G 57.46 0.86 243 > 28 ssd 1.76109 1.00000 1803G 1043G 760G 57.85 0.86 245 > 40 ssd 1.76109 1.00000 1803G 1047G 755G 58.10 0.87 245 > 41 ssd 1.76109 1.00000 1803G 1046G 756G 58.06 0.86 245 > 62 ssd 1.76109 1.00000 1803G 1050G 752G 58.25 0.87 245 > 39 ssd 1.76109 1.00000 1803G 1051G 751G 58.30 0.87 246 > 56 ssd 1.76109 1.00000 1803G 1050G 752G 58.27 0.87 246 > 70 ssd 1.76109 1.00000 1803G 1041G 761G 57.75 0.86 246 > 73 ssd 1.76109 1.00000 1803G 1057G 746G 58.63 0.87 247 > 44 ssd 1.76109 1.00000 1803G 1056G 746G 58.58 0.87 248 > 38 ssd 1.76109 1.00000 1803G 1059G 743G 58.75 0.87 249 > 51 ssd 1.76109 1.00000 1803G 1063G 739G 58.99 0.88 249 > 33 ssd 1.76109 1.00000 1803G 1067G 736G 59.18 0.88 250 > 36 ssd 1.76109 1.00000 1803G 1071G 731G 59.41 0.88 251 > 55 ssd 1.76109 1.00000 1803G 1066G 737G 59.11 0.88 251 > 27 ssd 1.76109 1.00000 1803G 1078G 724G 59.81 0.89 252 > 31 ssd 1.76109 1.00000 1803G 1079G 724G 59.84 0.89 252 > 69 ssd 1.76109 1.00000 1803G 1075G 727G 59.63 0.89 252 > 46 ssd 1.76109 1.00000 1803G 1082G 721G 60.00 0.89 253 > 58 ssd 1.76109 1.00000 1803G 1081G 721G 59.98 0.89 253 > 66 ssd 1.76109 1.00000 1803G 1081G 722G 59.96 0.89 253 > 34 ssd 1.76109 1.00000 1803G 1091G 712G 60.52 0.90 255 > 43 ssd 1.76109 1.00000 1803G 1089G 713G 60.42 0.90 256 > 64 ssd 1.76109 1.00000 1803G 1097G 705G 60.87 0.91 257 > 24 ssd 1.76109 1.00000 1803G 1113G 690G 61.72 0.92 260 > 25 ssd 1.76109 1.00000 1803G 1146G 656G 63.58 0.95 269 > 29 ssd 1.76109 1.00000 1803G 1146G 656G 63.59 0.95 269 > 71 ssd 1.76109 1.00000 1803G 1151G 651G 63.88 0.95 269 > 57 ssd 1.76109 1.00000 1803G 1183G 619G 65.63 0.98 278 > 60 ssd 1.76109 1.00000 1803G 1183G 620G 65.60 0.98 278 > 53 ssd 1.76109 1.00000 1803G 1220G 583G 67.67 1.01 286 > 50 ssd 1.76109 1.00000 1803G 1283G 519G 71.19 1.06 303 The SSD’s are roughly the same in that PG distribution is matching usage, so I don’t expect a bunch of empty PG’s or anything like that. So looking at the balancer, I tried to create a plan, and execute the plan, however nothing appears to be happening. I’m assuming I would expect to see backfills take place when it starts re-balancing the PGs (and thus data). > $ ceph balancer eval > current cluster score 0.024025 (lower is better) > $ ceph balancer optimize 180412.plan1 > $ ceph balancer status > { > "active": true, > "plans": [ > "180412.plan1" > ], > "mode": "crush-compat" > } > $ ceph balancer eval 180412.plan1 > plan 180412.plan1 final score 0.024025 (lower is better) > $ ceph balancer show 180412.plan1 > # starting osdmap epoch 89751 > # starting crush version 250 > # mode crush-combat So maybe I’m not giving it specific parameters? Here is a pastebin dump of ceph balancer dump $plan: https://pastebin.com/S6JwtY5Q <https://pastebin.com/S6JwtY5Q> In another ML thread I found someone with more showing for the balancer configs compared to what I have here: > $ ceph config-key dump > { > "mgr/balancer/active": "1", > "mgr/balancer/mode": "crush-compat", > "mgr/influx/hostname": "", > "mgr/influx/password": "", > "mgr/influx/username": "" > } This compared to what someone else had posted in the other thread: > ceph config-key dump > { > "mgr/balancer/active": "1", > "mgr/balancer/begin_time": "0830", > "mgr/balancer/end_time": "1600", > "mgr/balancer/max_misplaced": "0.01", > "mgr/balancer/mode": "crush-compat” > } So I figure there is some less than perfectly documented step that I am missing, and it is not in fact “turn on and forget it” as Sage mentioned in his presentation, at least in the current form. Appreciate the help, Reed
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com