On Wed, May 08, 2019 at 09:34:44AM +0100, Mark Adams wrote: > Thanks for getting back to me Alwin. See my response below. > > > I have the same size and count in each node, but I have had a disk failure > (has been replaced) and also had issues with osds dropping when that memory > allocation bug was around just before last christmas (Think it was when > they made some bluestore updates, then the next release they increased the > default memory allocation to rectify the issue) so that could have messed > up the balance. Ok, that can impact the distribution of PGs. Could you please post the crush tunables too? Maybe there could be something to tweak, besides the reweight-by-utilization.
> > ceph osd df tree: > > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE > NAME > -1 209.58572 - 210TiB 151TiB 58.8TiB 71.92 1.00 - root > default > -3 69.86191 - 69.9TiB 50.2TiB 19.6TiB 71.91 1.00 - host > prod-pve1 > 0 ssd 6.98619 0.90002 6.99TiB 5.70TiB 1.29TiB 81.54 1.13 116 > osd.0 > 1 ssd 6.98619 1.00000 6.99TiB 5.49TiB 1.49TiB 78.65 1.09 112 > osd.1 > 2 ssd 6.98619 1.00000 6.99TiB 4.95TiB 2.03TiB 70.88 0.99 101 > osd.2 > 4 ssd 6.98619 1.00000 6.99TiB 4.90TiB 2.09TiB 70.11 0.97 100 > osd.4 > 5 ssd 6.98619 1.00000 6.99TiB 4.52TiB 2.47TiB 64.67 0.90 92 > osd.5 > 6 ssd 6.98619 1.00000 6.99TiB 5.34TiB 1.64TiB 76.50 1.06 109 > osd.6 > 7 ssd 6.98619 1.00000 6.99TiB 4.56TiB 2.42TiB 65.31 0.91 93 > osd.7 > 8 ssd 6.98619 1.00000 6.99TiB 4.91TiB 2.08TiB 70.21 0.98 100 > osd.8 > 9 ssd 6.98619 1.00000 6.99TiB 4.66TiB 2.32TiB 66.76 0.93 95 > osd.9 > 30 ssd 6.98619 1.00000 6.99TiB 5.20TiB 1.78TiB 74.49 1.04 106 > osd.30 > -5 69.86191 - 69.9TiB 50.3TiB 19.6TiB 71.93 1.00 - host > prod-pve2 > 10 ssd 6.98619 1.00000 6.99TiB 4.47TiB 2.52TiB 63.92 0.89 91 > osd.10 > 11 ssd 6.98619 1.00000 6.99TiB 4.86TiB 2.13TiB 69.53 0.97 99 > osd.11 > 12 ssd 6.98619 1.00000 6.99TiB 4.46TiB 2.52TiB 63.91 0.89 91 > osd.12 > 13 ssd 6.98619 1.00000 6.99TiB 4.71TiB 2.28TiB 67.43 0.94 96 > osd.13 > 14 ssd 6.98619 1.00000 6.99TiB 5.50TiB 1.49TiB 78.68 1.09 112 > osd.14 > 15 ssd 6.98619 1.00000 6.99TiB 5.20TiB 1.79TiB 74.38 1.03 106 > osd.15 > 16 ssd 6.98619 1.00000 6.99TiB 4.66TiB 2.32TiB 66.74 0.93 95 > osd.16 > 17 ssd 6.98619 1.00000 6.99TiB 5.51TiB 1.48TiB 78.84 1.10 112 > osd.17 > 18 ssd 6.98619 1.00000 6.99TiB 5.40TiB 1.59TiB 77.24 1.07 110 > osd.18 > 19 ssd 6.98619 1.00000 6.99TiB 5.50TiB 1.49TiB 78.66 1.09 112 > osd.19 > -7 69.86191 - 69.9TiB 50.2TiB 19.6TiB 71.93 1.00 - host > prod-pve3 > 20 ssd 6.98619 1.00000 6.99TiB 4.22TiB 2.77TiB 60.40 0.84 86 > osd.20 > 21 ssd 6.98619 1.00000 6.99TiB 4.43TiB 2.56TiB 63.35 0.88 90 > osd.21 > 22 ssd 6.98619 0.95001 6.99TiB 5.69TiB 1.30TiB 81.45 1.13 116 > osd.22 > 23 ssd 6.98619 1.00000 6.99TiB 4.67TiB 2.32TiB 66.79 0.93 95 > osd.23 > 24 ssd 6.98619 0.95001 6.99TiB 5.74TiB 1.24TiB 82.20 1.14 117 > osd.24 > 25 ssd 6.98619 1.00000 6.99TiB 4.51TiB 2.47TiB 64.59 0.90 92 > osd.25 > 26 ssd 6.98619 1.00000 6.99TiB 4.90TiB 2.09TiB 70.15 0.98 100 > osd.26 > 27 ssd 6.98619 1.00000 6.99TiB 5.39TiB 1.59TiB 77.21 1.07 110 > osd.27 > 28 ssd 6.98619 1.00000 6.99TiB 5.69TiB 1.29TiB 81.47 1.13 116 > osd.28 > 29 ssd 6.98619 1.00000 6.99TiB 5.00TiB 1.98TiB 71.63 1.00 102 > osd.29 > TOTAL 210TiB 151TiB 58.8TiB 71.92 > > MIN/MAX VAR: 0.84/1.14 STDDEV: 6.44 How many placement groups do(es) your pool(s) have? > > > > > > > > > > > Is it safe enough to keep tweaking this? (I believe I should run ceph osd > > > reweight-by-utilization 101 0.05 15) Is there any gotchas I need to be > > > aware of when doing this apart from the obvious load of reshuffling the > > > data around? The cluster has 30 OSDs and it looks like it will reweight > > 13. > > Your cluster may get more and more unbalanced. Eg. making a OSD > > replacement a bigger challenge. > > > > > It can make the balance worse? I thought the whole point was to get it back > in balance! :) Yes, but just meant, be carefull. ;) I have re-read the section in ceph's docs and the reweights are relative to eachother. So, it should not do much harm, but I faintly recall that I had issues with PG distribution afterwards. My old memory. ^^ -- Cheers, Alwin _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user