Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'
On 02/23/2018 12:42 AM, Mike Lovell wrote: was the pg-upmap feature used to force a pg to get mapped to a particular osd? Yes it was. This is a semi-production cluster where the balancer module has been enabled with the upmap feature. It remapped PGs it seems to OSDs on the same host. root@man:~# ceph osd dump|grep pg_upmap|grep 1.41 pg_upmap_items 1.41 [9,15,11,7,10,2] root@man:~# I don't know exactly what I have to extract from that output, but it does seem to be the case here. I removed the upmap entry for this PG and fixed it there: $ ceph osd rm-pg-upmap-items 1.41 I also disabled the balancer for now (will report a issue) and removed all other upmap entries: $ ceph osd dump|grep pg_upmap_items|awk '{print $2}'|xargs -n 1 ceph osd rm-pg-upmap-items Thanks for the hint! Wido mike On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollander> wrote: Hi, I have a situation with a cluster which was recently upgraded to Luminous and has a PG mapped to OSDs on the same host. root@man:~# ceph pg map 1.41 osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4] root@man:~# root@man:~# ceph osd find 15|jq -r '.crush_location.host' n02 root@man:~# ceph osd find 7|jq -r '.crush_location.host' n01 root@man:~# ceph osd find 4|jq -r '.crush_location.host' n02 root@man:~# As you can see, OSD 15 and 4 are both on the host 'n02'. This PG went inactive when the machine hosting both OSDs went down for maintenance. My first suspect was the CRUSHMap and the rules, but those are fine: rule replicated_ruleset { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } This is the only rule in the CRUSHMap. ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 19.50325 root default -2 2.78618 host n01 5 ssd 0.92999 osd.5 up 1.0 1.0 7 ssd 0.92619 osd.7 up 1.0 1.0 14 ssd 0.92999 osd.14 up 1.0 1.0 -3 2.78618 host n02 4 ssd 0.92999 osd.4 up 1.0 1.0 8 ssd 0.92619 osd.8 up 1.0 1.0 15 ssd 0.92999 osd.15 up 1.0 1.0 -4 2.78618 host n03 3 ssd 0.92999 osd.3 up 0.94577 1.0 9 ssd 0.92619 osd.9 up 0.82001 1.0 16 ssd 0.92999 osd.16 up 0.84885 1.0 -5 2.78618 host n04 2 ssd 0.92999 osd.2 up 0.93501 1.0 10 ssd 0.92619 osd.10 up 0.76031 1.0 17 ssd 0.92999 osd.17 up 0.82883 1.0 -6 2.78618 host n05 6 ssd 0.92999 osd.6 up 0.84470 1.0 11 ssd 0.92619 osd.11 up 0.80530 1.0 18 ssd 0.92999 osd.18 up 0.86501 1.0 -7 2.78618 host n06 1 ssd 0.92999 osd.1 up 0.88353 1.0 12 ssd 0.92619 osd.12 up 0.79602 1.0 19 ssd 0.92999 osd.19 up 0.83171 1.0 -8 2.78618 host n07 0 ssd 0.92999 osd.0 up 1.0 1.0 13 ssd 0.92619 osd.13 up 0.86043 1.0 20 ssd 0.92999 osd.20 up 0.77153 1.0 Here you see osd.15 and osd.4 on the same host 'n02'. This cluster was upgraded from Hammer to Jewel and now Luminous and it doesn't have the latest tunables yet, but should that matter? I never encountered this before. tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 I don't want to touch this yet in the case this is a bug or glitch in the matrix somewhere. I hope it's just a admin mistake, but so far I'm not able to find a clue pointing to that. root@man:~# ceph osd dump|head -n 12 epoch 21545 fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a created 2015-04-28 14:43:53.950159 modified 2018-02-22 17:56:42.497849 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 22 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client luminous require_osd_release luminous root@man:~# I also downloaded the CRUSHmap and ran crushtool with --test and --show-mappings, but that didn't show any PG mapped to the same host. Any ideas on what might be going on here? Wido ___ ceph-users mailing list
Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'
was the pg-upmap feature used to force a pg to get mapped to a particular osd? mike On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollanderwrote: > Hi, > > I have a situation with a cluster which was recently upgraded to Luminous > and has a PG mapped to OSDs on the same host. > > root@man:~# ceph pg map 1.41 > osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4] > root@man:~# > > root@man:~# ceph osd find 15|jq -r '.crush_location.host' > n02 > root@man:~# ceph osd find 7|jq -r '.crush_location.host' > n01 > root@man:~# ceph osd find 4|jq -r '.crush_location.host' > n02 > root@man:~# > > As you can see, OSD 15 and 4 are both on the host 'n02'. > > This PG went inactive when the machine hosting both OSDs went down for > maintenance. > > My first suspect was the CRUSHMap and the rules, but those are fine: > > rule replicated_ruleset { > id 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > This is the only rule in the CRUSHMap. > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 19.50325 root default > -22.78618 host n01 > 5 ssd 0.92999 osd.5 up 1.0 1.0 > 7 ssd 0.92619 osd.7 up 1.0 1.0 > 14 ssd 0.92999 osd.14 up 1.0 1.0 > -32.78618 host n02 > 4 ssd 0.92999 osd.4 up 1.0 1.0 > 8 ssd 0.92619 osd.8 up 1.0 1.0 > 15 ssd 0.92999 osd.15 up 1.0 1.0 > -42.78618 host n03 > 3 ssd 0.92999 osd.3 up 0.94577 1.0 > 9 ssd 0.92619 osd.9 up 0.82001 1.0 > 16 ssd 0.92999 osd.16 up 0.84885 1.0 > -52.78618 host n04 > 2 ssd 0.92999 osd.2 up 0.93501 1.0 > 10 ssd 0.92619 osd.10 up 0.76031 1.0 > 17 ssd 0.92999 osd.17 up 0.82883 1.0 > -62.78618 host n05 > 6 ssd 0.92999 osd.6 up 0.84470 1.0 > 11 ssd 0.92619 osd.11 up 0.80530 1.0 > 18 ssd 0.92999 osd.18 up 0.86501 1.0 > -72.78618 host n06 > 1 ssd 0.92999 osd.1 up 0.88353 1.0 > 12 ssd 0.92619 osd.12 up 0.79602 1.0 > 19 ssd 0.92999 osd.19 up 0.83171 1.0 > -82.78618 host n07 > 0 ssd 0.92999 osd.0 up 1.0 1.0 > 13 ssd 0.92619 osd.13 up 0.86043 1.0 > 20 ssd 0.92999 osd.20 up 0.77153 1.0 > > Here you see osd.15 and osd.4 on the same host 'n02'. > > This cluster was upgraded from Hammer to Jewel and now Luminous and it > doesn't have the latest tunables yet, but should that matter? I never > encountered this before. > > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > tunable chooseleaf_vary_r 1 > tunable chooseleaf_stable 1 > tunable straw_calc_version 1 > tunable allowed_bucket_algs 54 > > I don't want to touch this yet in the case this is a bug or glitch in the > matrix somewhere. > > I hope it's just a admin mistake, but so far I'm not able to find a clue > pointing to that. > > root@man:~# ceph osd dump|head -n 12 > epoch 21545 > fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a > created 2015-04-28 14:43:53.950159 > modified 2018-02-22 17:56:42.497849 > flags sortbitwise,recovery_deletes,purged_snapdirs > crush_version 22 > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > require_min_compat_client luminous > min_compat_client luminous > require_osd_release luminous > root@man:~# > > I also downloaded the CRUSHmap and ran crushtool with --test and > --show-mappings, but that didn't show any PG mapped to the same host. > > Any ideas on what might be going on here? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'
On Thu, Feb 22, 2018 at 9:29 AM Wido den Hollanderwrote: > Hi, > > I have a situation with a cluster which was recently upgraded to > Luminous and has a PG mapped to OSDs on the same host. > > root@man:~# ceph pg map 1.41 > osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4] > root@man:~# > > root@man:~# ceph osd find 15|jq -r '.crush_location.host' > n02 > root@man:~# ceph osd find 7|jq -r '.crush_location.host' > n01 > root@man:~# ceph osd find 4|jq -r '.crush_location.host' > n02 > root@man:~# > > As you can see, OSD 15 and 4 are both on the host 'n02'. > > This PG went inactive when the machine hosting both OSDs went down for > maintenance. > > My first suspect was the CRUSHMap and the rules, but those are fine: > > rule replicated_ruleset { > id 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > This is the only rule in the CRUSHMap. > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 19.50325 root default > -22.78618 host n01 > 5 ssd 0.92999 osd.5 up 1.0 1.0 > 7 ssd 0.92619 osd.7 up 1.0 1.0 > 14 ssd 0.92999 osd.14 up 1.0 1.0 > -32.78618 host n02 > 4 ssd 0.92999 osd.4 up 1.0 1.0 > 8 ssd 0.92619 osd.8 up 1.0 1.0 > 15 ssd 0.92999 osd.15 up 1.0 1.0 > -42.78618 host n03 > 3 ssd 0.92999 osd.3 up 0.94577 1.0 > 9 ssd 0.92619 osd.9 up 0.82001 1.0 > 16 ssd 0.92999 osd.16 up 0.84885 1.0 > -52.78618 host n04 > 2 ssd 0.92999 osd.2 up 0.93501 1.0 > 10 ssd 0.92619 osd.10 up 0.76031 1.0 > 17 ssd 0.92999 osd.17 up 0.82883 1.0 > -62.78618 host n05 > 6 ssd 0.92999 osd.6 up 0.84470 1.0 > 11 ssd 0.92619 osd.11 up 0.80530 1.0 > 18 ssd 0.92999 osd.18 up 0.86501 1.0 > -72.78618 host n06 > 1 ssd 0.92999 osd.1 up 0.88353 1.0 > 12 ssd 0.92619 osd.12 up 0.79602 1.0 > 19 ssd 0.92999 osd.19 up 0.83171 1.0 > -82.78618 host n07 > 0 ssd 0.92999 osd.0 up 1.0 1.0 > 13 ssd 0.92619 osd.13 up 0.86043 1.0 > 20 ssd 0.92999 osd.20 up 0.77153 1.0 > > Here you see osd.15 and osd.4 on the same host 'n02'. > > This cluster was upgraded from Hammer to Jewel and now Luminous and it > doesn't have the latest tunables yet, but should that matter? I never > encountered this before. > > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > tunable chooseleaf_vary_r 1 > tunable chooseleaf_stable 1 > tunable straw_calc_version 1 > tunable allowed_bucket_algs 54 > > I don't want to touch this yet in the case this is a bug or glitch in > the matrix somewhere. > > I hope it's just a admin mistake, but so far I'm not able to find a clue > pointing to that. > > root@man:~# ceph osd dump|head -n 12 > epoch 21545 > fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a > created 2015-04-28 14:43:53.950159 > modified 2018-02-22 17:56:42.497849 > flags sortbitwise,recovery_deletes,purged_snapdirs > crush_version 22 > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > require_min_compat_client luminous > min_compat_client luminous > require_osd_release luminous > root@man:~# > > I also downloaded the CRUSHmap and ran crushtool with --test and > --show-mappings, but that didn't show any PG mapped to the same host. > What *was* the mapping for the PG in question, then? At a first guess, it sounds to me like CRUSH is failing to map the appropriate number of participants on this PG, so one of the extant OSDs from a prior epoch is getting drafted. I would expect this to show up as a remapped PG. -Greg > > Any ideas on what might be going on here? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'
Hi, I have a situation with a cluster which was recently upgraded to Luminous and has a PG mapped to OSDs on the same host. root@man:~# ceph pg map 1.41 osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4] root@man:~# root@man:~# ceph osd find 15|jq -r '.crush_location.host' n02 root@man:~# ceph osd find 7|jq -r '.crush_location.host' n01 root@man:~# ceph osd find 4|jq -r '.crush_location.host' n02 root@man:~# As you can see, OSD 15 and 4 are both on the host 'n02'. This PG went inactive when the machine hosting both OSDs went down for maintenance. My first suspect was the CRUSHMap and the rules, but those are fine: rule replicated_ruleset { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } This is the only rule in the CRUSHMap. ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 19.50325 root default -22.78618 host n01 5 ssd 0.92999 osd.5 up 1.0 1.0 7 ssd 0.92619 osd.7 up 1.0 1.0 14 ssd 0.92999 osd.14 up 1.0 1.0 -32.78618 host n02 4 ssd 0.92999 osd.4 up 1.0 1.0 8 ssd 0.92619 osd.8 up 1.0 1.0 15 ssd 0.92999 osd.15 up 1.0 1.0 -42.78618 host n03 3 ssd 0.92999 osd.3 up 0.94577 1.0 9 ssd 0.92619 osd.9 up 0.82001 1.0 16 ssd 0.92999 osd.16 up 0.84885 1.0 -52.78618 host n04 2 ssd 0.92999 osd.2 up 0.93501 1.0 10 ssd 0.92619 osd.10 up 0.76031 1.0 17 ssd 0.92999 osd.17 up 0.82883 1.0 -62.78618 host n05 6 ssd 0.92999 osd.6 up 0.84470 1.0 11 ssd 0.92619 osd.11 up 0.80530 1.0 18 ssd 0.92999 osd.18 up 0.86501 1.0 -72.78618 host n06 1 ssd 0.92999 osd.1 up 0.88353 1.0 12 ssd 0.92619 osd.12 up 0.79602 1.0 19 ssd 0.92999 osd.19 up 0.83171 1.0 -82.78618 host n07 0 ssd 0.92999 osd.0 up 1.0 1.0 13 ssd 0.92619 osd.13 up 0.86043 1.0 20 ssd 0.92999 osd.20 up 0.77153 1.0 Here you see osd.15 and osd.4 on the same host 'n02'. This cluster was upgraded from Hammer to Jewel and now Luminous and it doesn't have the latest tunables yet, but should that matter? I never encountered this before. tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 I don't want to touch this yet in the case this is a bug or glitch in the matrix somewhere. I hope it's just a admin mistake, but so far I'm not able to find a clue pointing to that. root@man:~# ceph osd dump|head -n 12 epoch 21545 fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a created 2015-04-28 14:43:53.950159 modified 2018-02-22 17:56:42.497849 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 22 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client luminous require_osd_release luminous root@man:~# I also downloaded the CRUSHmap and ran crushtool with --test and --show-mappings, but that didn't show any PG mapped to the same host. Any ideas on what might be going on here? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com