Doing some more research this evening, it turns out the big divergence between the POOLS %USED and GLOBAL %RAW USED I've had is because the pool numbers are based on the amount of space that the most full OSD has left.
So if you have 1 OSD that is disproportionately full, the %USED for POOLS will only show you the capacity you have until that overweight OSD is full. I've done quite a bit of reweighting and the %USED (POOLS) and %RAW USED (GLOBAL) are now much closer together. Cheers for your help so far Alwin - If you have any suggestions to improve things based on my current tunables I would love to have your input. Cheers, Mark On Wed, 8 May 2019 at 11:53, Mark Adams <m...@openvs.co.uk> wrote: > > > On Wed, 8 May 2019 at 11:34, Alwin Antreich <a.antre...@proxmox.com> > wrote: > >> On Wed, May 08, 2019 at 09:34:44AM +0100, Mark Adams wrote: >> > Thanks for getting back to me Alwin. See my response below. >> > >> > >> > I have the same size and count in each node, but I have had a disk >> failure >> > (has been replaced) and also had issues with osds dropping when that >> memory >> > allocation bug was around just before last christmas (Think it was when >> > they made some bluestore updates, then the next release they increased >> the >> > default memory allocation to rectify the issue) so that could have >> messed >> > up the balance. >> Ok, that can impact the distribution of PGs. Could you please post the >> crush tunables too? Maybe there could be something to tweak, besides the >> reweight-by-utilization. >> > > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 1, > "straw_calc_version": 1, > "allowed_bucket_algs": 54, > "profile": "jewel", > "optimal_tunables": 1, > "legacy_tunables": 0, > "minimum_required_version": "jewel", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 0, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 1, > "require_feature_tunables5": 1, > "has_v5_rules": 0 > > >> > >> > ceph osd df tree: >> > >> > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE >> > NAME >> > -1 209.58572 - 210TiB 151TiB 58.8TiB 71.92 1.00 - root >> > default >> > -3 69.86191 - 69.9TiB 50.2TiB 19.6TiB 71.91 1.00 - >> host >> > prod-pve1 >> > 0 ssd 6.98619 0.90002 6.99TiB 5.70TiB 1.29TiB 81.54 1.13 116 >> > osd.0 >> > 1 ssd 6.98619 1.00000 6.99TiB 5.49TiB 1.49TiB 78.65 1.09 112 >> > osd.1 >> > 2 ssd 6.98619 1.00000 6.99TiB 4.95TiB 2.03TiB 70.88 0.99 101 >> > osd.2 >> > 4 ssd 6.98619 1.00000 6.99TiB 4.90TiB 2.09TiB 70.11 0.97 100 >> > osd.4 >> > 5 ssd 6.98619 1.00000 6.99TiB 4.52TiB 2.47TiB 64.67 0.90 92 >> > osd.5 >> > 6 ssd 6.98619 1.00000 6.99TiB 5.34TiB 1.64TiB 76.50 1.06 109 >> > osd.6 >> > 7 ssd 6.98619 1.00000 6.99TiB 4.56TiB 2.42TiB 65.31 0.91 93 >> > osd.7 >> > 8 ssd 6.98619 1.00000 6.99TiB 4.91TiB 2.08TiB 70.21 0.98 100 >> > osd.8 >> > 9 ssd 6.98619 1.00000 6.99TiB 4.66TiB 2.32TiB 66.76 0.93 95 >> > osd.9 >> > 30 ssd 6.98619 1.00000 6.99TiB 5.20TiB 1.78TiB 74.49 1.04 106 >> > osd.30 >> > -5 69.86191 - 69.9TiB 50.3TiB 19.6TiB 71.93 1.00 - >> host >> > prod-pve2 >> > 10 ssd 6.98619 1.00000 6.99TiB 4.47TiB 2.52TiB 63.92 0.89 91 >> > osd.10 >> > 11 ssd 6.98619 1.00000 6.99TiB 4.86TiB 2.13TiB 69.53 0.97 99 >> > osd.11 >> > 12 ssd 6.98619 1.00000 6.99TiB 4.46TiB 2.52TiB 63.91 0.89 91 >> > osd.12 >> > 13 ssd 6.98619 1.00000 6.99TiB 4.71TiB 2.28TiB 67.43 0.94 96 >> > osd.13 >> > 14 ssd 6.98619 1.00000 6.99TiB 5.50TiB 1.49TiB 78.68 1.09 112 >> > osd.14 >> > 15 ssd 6.98619 1.00000 6.99TiB 5.20TiB 1.79TiB 74.38 1.03 106 >> > osd.15 >> > 16 ssd 6.98619 1.00000 6.99TiB 4.66TiB 2.32TiB 66.74 0.93 95 >> > osd.16 >> > 17 ssd 6.98619 1.00000 6.99TiB 5.51TiB 1.48TiB 78.84 1.10 112 >> > osd.17 >> > 18 ssd 6.98619 1.00000 6.99TiB 5.40TiB 1.59TiB 77.24 1.07 110 >> > osd.18 >> > 19 ssd 6.98619 1.00000 6.99TiB 5.50TiB 1.49TiB 78.66 1.09 112 >> > osd.19 >> > -7 69.86191 - 69.9TiB 50.2TiB 19.6TiB 71.93 1.00 - >> host >> > prod-pve3 >> > 20 ssd 6.98619 1.00000 6.99TiB 4.22TiB 2.77TiB 60.40 0.84 86 >> > osd.20 >> > 21 ssd 6.98619 1.00000 6.99TiB 4.43TiB 2.56TiB 63.35 0.88 90 >> > osd.21 >> > 22 ssd 6.98619 0.95001 6.99TiB 5.69TiB 1.30TiB 81.45 1.13 116 >> > osd.22 >> > 23 ssd 6.98619 1.00000 6.99TiB 4.67TiB 2.32TiB 66.79 0.93 95 >> > osd.23 >> > 24 ssd 6.98619 0.95001 6.99TiB 5.74TiB 1.24TiB 82.20 1.14 117 >> > osd.24 >> > 25 ssd 6.98619 1.00000 6.99TiB 4.51TiB 2.47TiB 64.59 0.90 92 >> > osd.25 >> > 26 ssd 6.98619 1.00000 6.99TiB 4.90TiB 2.09TiB 70.15 0.98 100 >> > osd.26 >> > 27 ssd 6.98619 1.00000 6.99TiB 5.39TiB 1.59TiB 77.21 1.07 110 >> > osd.27 >> > 28 ssd 6.98619 1.00000 6.99TiB 5.69TiB 1.29TiB 81.47 1.13 116 >> > osd.28 >> > 29 ssd 6.98619 1.00000 6.99TiB 5.00TiB 1.98TiB 71.63 1.00 102 >> > osd.29 >> > TOTAL 210TiB 151TiB 58.8TiB 71.92 >> > >> > MIN/MAX VAR: 0.84/1.14 STDDEV: 6.44 >> How many placement groups do(es) your pool(s) have? >> >> > 1024 > > Cheers! > >> > >> > >> > >> > > >> > > > >> > > > Is it safe enough to keep tweaking this? (I believe I should run >> ceph osd >> > > > reweight-by-utilization 101 0.05 15) Is there any gotchas I need to >> be >> > > > aware of when doing this apart from the obvious load of reshuffling >> the >> > > > data around? The cluster has 30 OSDs and it looks like it will >> reweight >> > > 13. >> > > Your cluster may get more and more unbalanced. Eg. making a OSD >> > > replacement a bigger challenge. >> > > >> > > >> > It can make the balance worse? I thought the whole point was to get it >> back >> > in balance! :) >> Yes, but just meant, be carefull. ;) I have re-read the section in >> ceph's docs and the reweights are relative to eachother. So, it should >> not do much harm, but I faintly recall that I had issues with PG >> distribution afterwards. My old memory. ^^ >> >> -- >> Cheers, >> Alwin > > _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user