Re: [ceph-users] cluster can't remapped objects after change crush tree
Thanks a lot for your help. Konstantin Shalygin writes: > On 04/27/2018 05:05 PM, Igor Gajsin wrote: >> I have a crush rule like > > > You still can use device classes! > > >> * host0 has a piece of data on osd.0 > Not peace, full object. If we talk about non-EC pools. >> * host1 has pieces of data on osd.1 and osd.2 > host1 has copy on osd.1 *or* osd.2 >> * host2 has no data > host2 also will be have one copy of object. > > Also do not forget - hosts with half of osds of host1 (i.e. host0 and > host2) will be do "double work" in comparison. > You can minimize this impact via decreasing osd crush weights for host1. > > > > > > k -- With best regards, Igor Gajsin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cluster can't remapped objects after change crush tree
Thanks, man. Thanks a lot. Now I'm understood. So, to be sure If I have 3 hosts, replicating factor is also 3 and I have a crush rule like: { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } My data is replicated across hosts, not across osds, all hosts have pieces of data and a situation like: * host0 has a piece of data on osd.0 * host1 has pieces of data on osd.1 and osd.2 * host2 has no data is completely excluded? Konstantin Shalygin writes: > On 04/27/2018 04:37 PM, Igor Gajsin wrote: >> pool 7 'rbd' replicated size 3 min_size 2 crush_rule 0 > > > Your pools have proper size settings - is 3. But you crush have only 2 > buckets for this rule (e.g. is your pods). > For making this rule work you should have minimum of 3 'pod' buckets. > > > > > k -- With best regards, Igor Gajsin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cluster can't remapped objects after change crush tree
# ceph osd pool ls detail pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 958 lfor 0/909 flags hashpspool stripe_width 0 application cephfs pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 954 flags hashpspool stripe_width 0 application cephfs pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 22 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 6 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 28 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 7 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1161 flags hashpspool stripe_width 0 application rbd removed_snaps [1~3] pool 8 'kube' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1241 lfor 0/537 flags hashpspool stripe_width 0 application cephfs removed_snaps [1~5,7~2] crush rule 3 is ceph osd crush rule dump podshdd { "rule_id": 3, "rule_name": "podshdd", "ruleset": 3, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "pod" }, { "op": "emit" } ] } Konstantin Shalygin writes: > On 04/26/2018 11:30 PM, Igor Gajsin wrote: >> after assigning this rule to a pool it stucks in the same state: > > > `ceph osd pool ls detail` please > > > > > k -- With best regards, Igor Gajsin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cluster can't remapped objects after change crush tree
Hi Konstantin, thanks a lot for your response. > Your crush is imbalanced: I do it deliberately. The group2 of my small-but-helpful ceph cluster also will be a master-nodes for my new small-but-helpful kubernetes cluster. And I what I want to achieve is: there are 2 groups of nodes, and even if one of them completely failed (during k8s installation), another group will contain a copy of data. But, ok. Let's rebalance it for test purpose: ID CLASS WEIGHT TYPE NAME -1 3.63835 root default -9 1.81917 pod group1 -3 0.90958 host feather0 0 hdd 0.90958 osd.0 -5 0.90959 host feather1 1 hdd 0.90959 osd.1 -10 1.81918 pod group2 -7 1.81918 host ds1 2 hdd 0.90959 osd.2 3 hdd 0.90959 osd.3 and add your rule > ceph osd crush rule create-replicated podshdd default pod hdd # ceph osd crush rule dump podshdd { "rule_id": 3, "rule_name": "podshdd", "ruleset": 3, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "pod" }, { "op": "emit" } ] } after assigning this rule to a pool it stucks in the same state: # ceph -s cluster: id: 34b66329-b511-4d97-9e07-7b1a0a6879ef health: HEALTH_WARN 3971/42399 objects misplaced (9.366%) services: mon: 3 daemons, quorum feather0,feather1,ds1 mgr: ds1(active), standbys: feather1, feather0 mds: cephfs-1/1/1 up {0=feather0=up:active}, 2 up:standby osd: 4 osds: 4 up, 4 in; 128 remapped pgs rgw: 3 daemons active data: pools: 8 pools, 264 pgs objects: 14133 objects, 49684 MB usage: 143 GB used, 3582 GB / 3725 GB avail pgs: 3971/42399 objects misplaced (9.366%) 136 active+clean 128 active+clean+remapped io: client: 19441 B/s rd, 29673 B/s wr, 18 op/s rd, 18 op/s wr And what interesting. First, it complains like "object misplaced (23%)" and ceph health detail shows a lot of degraded pg. But then there is no pg in its output: # ceph health detail HEALTH_WARN 3971/42399 objects misplaced (9.366%) OBJECT_MISPLACED 3971/42399 objects misplaced (9.366%) and amount of misplaced objects stops reducing it is equal 9.366 last 30 mins. If switch the crush rule back to default the cluster returns to HEALTH_OK state. Konstantin Shalygin writes: >> # ceph osd crush tree >> ID CLASS WEIGHT TYPE NAME >> -1 3.63835 root default >> -9 0.90959 pod group1 >> -5 0.90959 host feather1 >>1 hdd 0.90959 osd.1 >> -10 2.72876 pod group2 >> -7 1.81918 host ds1 >>2 hdd 0.90959 osd.2 >>3 hdd 0.90959 osd.3 >> -3 0.90958 host feather0 >>0 hdd 0.90958 osd.0 >> >> And I've made a rule >> >> # ceph osd crush rule dump pods >> { >> "rule_id": 1, >> "rule_name": "pods", >> "ruleset": 1, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { >> "op": "take", >> "item": -1, >> "item_name": "default" >> }, >> { >> "op": "chooseleaf_firstn", >> "num": 0, >> "type": "pod" >> }, >> { >> "op": "emit" >> } >> ] >> } > > > 1. Assign device class to your crush rule: > > ceph osd crush rule create-replicated pods default pod hdd > > 2. Your crush is imbalanced: > > *good*: > > root: > > host1: > > - osd0 > > host2: > > - osd1 > > host3: > > - osd3 > > > *bad*: > > root: > > host1: > > - osd0 > > host2: > > - osd1 > > - osd2 > > - osd3 > > > > > k -- With best regards, Igor Gajsin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cluster can't remapped objects after change crush tree
Hi, I've got stuck in a problem with crush rule. I have a small cluster with 3 nodes and 4 osd. I've decided to split it to 2 failure domains and made 2 buckets and put hosts in that buckets like in that instruction http://www.sebastien-han.fr/blog/2014/01/13/ceph-managing-crush-with-the-cli/ Finally, I've got crush tree like # ceph osd crush tree ID CLASS WEIGHT TYPE NAME -1 3.63835 root default -9 0.90959 pod group1 -5 0.90959 host feather1 1 hdd 0.90959 osd.1 -10 2.72876 pod group2 -7 1.81918 host ds1 2 hdd 0.90959 osd.2 3 hdd 0.90959 osd.3 -3 0.90958 host feather0 0 hdd 0.90958 osd.0 And I've made a rule # ceph osd crush rule dump pods { "rule_id": 1, "rule_name": "pods", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "pod" }, { "op": "emit" } ] } If to apply that rule to a pool, my cluster moves to # ceph -s cluster: id: 34b66329-b511-4d97-9e07-7b1a0a6879ef health: HEALTH_WARN 6/42198 objects misplaced (0.014%) services: mon: 3 daemons, quorum feather0,feather1,ds1 mgr: ds1(active), standbys: feather1, feather0 mds: cephfs-1/1/1 up {0=feather0=up:active}, 2 up:standby osd: 4 osds: 4 up, 4 in; 64 remapped pgs rgw: 3 daemons active data: pools: 8 pools, 264 pgs objects: 14066 objects, 49429 MB usage: 142 GB used, 3582 GB / 3725 GB avail pgs: 6/42198 objects misplaced (0.014%) 200 active+clean 64 active+clean+remapped io: client: 1897 kB/s wr, 0 op/s rd, 11 op/s wr And it's frozen in that state, self-healing doesn't occur, just stuck in the state: objects misplaced / pgs active+clean+remapped. I think something wrong with my rule, and the cluster can't move objects to rearrange it according to the new rule. I lost something and I have no idea what exactly. Any help would be appreciated. -- With best regards, Igor Gajsin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com