One possibly relevant detail: the cluster has 8 nodes, and the new pool I created uses k5 m2 erasure coding.

Vlad

On 4/9/20 11:28 AM, Vladimir Brik wrote:
Hello

I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43 'fs-data-k5m2-hdd' erasure size 7 min_size 6 crush_rule 7 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 48647 lfor 0/42080/42102 flags hashpspool,ec_overwrites,nearfull stripe_width 20480 application cephfs

OSD utilization varies between ~50% and about ~80%, with about 60% raw used. I am using a mixture of 9TB and 14TB drives. Number of PGs/drive varies 103 and 207.

# ceph osd df | grep hdd | sort -k 17 | (head -n 2; tail -n 2)
160   hdd 12.53519  1.00000  13 TiB 6.0 TiB 5.9 TiB  74 KiB  12 GiB 6.6 TiB 47.74 0.79 120     up 146   hdd 12.53519  1.00000  13 TiB 6.0 TiB 6.0 TiB  51 MiB  13 GiB 6.5 TiB 48.17 0.80 119     up  79   hdd  8.99799  1.00000 9.0 TiB 7.3 TiB 7.2 TiB  42 KiB  16 GiB 1.7 TiB 80.91 1.34 186     up  62   hdd  8.99799  1.00000 9.0 TiB 7.3 TiB 7.2 TiB 112 KiB  16 GiB 1.7 TiB 81.44 1.35 189     up

# ceph balancer status
{
     "last_optimize_duration": "0:00:00.339635",
     "plans": [],
     "mode": "crush-compat",
     "active": true,
    "optimize_result": "Some osds belong to multiple subtrees: {0: ['default', 'default~hdd'], ...
     "last_optimize_started": "Thu Apr  9 11:17:40 2020"
}



Does anybody know how to debug this?


Thanks,

Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to