Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

2018-02-23 Thread Wido den Hollander



On 02/23/2018 12:42 AM, Mike Lovell wrote:
was the pg-upmap feature used to force a pg to get mapped to a 
particular osd?




Yes it was. This is a semi-production cluster where the balancer module 
has been enabled with the upmap feature.


It remapped PGs it seems to OSDs on the same host.

root@man:~# ceph osd dump|grep pg_upmap|grep 1.41
pg_upmap_items 1.41 [9,15,11,7,10,2]
root@man:~#

I don't know exactly what I have to extract from that output, but it 
does seem to be the case here.


I removed the upmap entry for this PG and fixed it there:

$ ceph osd rm-pg-upmap-items 1.41

I also disabled the balancer for now (will report a issue) and removed 
all other upmap entries:


$ ceph osd dump|grep pg_upmap_items|awk '{print $2}'|xargs -n 1 ceph osd 
rm-pg-upmap-items


Thanks for the hint!

Wido


mike

On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollander > wrote:


Hi,

I have a situation with a cluster which was recently upgraded to
Luminous and has a PG mapped to OSDs on the same host.

root@man:~# ceph pg map 1.41
osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
root@man:~#

root@man:~# ceph osd find 15|jq -r '.crush_location.host'
n02
root@man:~# ceph osd find 7|jq -r '.crush_location.host'
n01
root@man:~# ceph osd find 4|jq -r '.crush_location.host'
n02
root@man:~#

As you can see, OSD 15 and 4 are both on the host 'n02'.

This PG went inactive when the machine hosting both OSDs went down
for maintenance.

My first suspect was the CRUSHMap and the rules, but those are fine:

rule replicated_ruleset {
         id 0
         type replicated
         min_size 1
         max_size 10
         step take default
         step chooseleaf firstn 0 type host
         step emit
}

This is the only rule in the CRUSHMap.

ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       19.50325 root default
-2        2.78618     host n01
  5   ssd  0.92999         osd.5      up  1.0 1.0
  7   ssd  0.92619         osd.7      up  1.0 1.0
14   ssd  0.92999         osd.14     up  1.0 1.0
-3        2.78618     host n02
  4   ssd  0.92999         osd.4      up  1.0 1.0
  8   ssd  0.92619         osd.8      up  1.0 1.0
15   ssd  0.92999         osd.15     up  1.0 1.0
-4        2.78618     host n03
  3   ssd  0.92999         osd.3      up  0.94577 1.0
  9   ssd  0.92619         osd.9      up  0.82001 1.0
16   ssd  0.92999         osd.16     up  0.84885 1.0
-5        2.78618     host n04
  2   ssd  0.92999         osd.2      up  0.93501 1.0
10   ssd  0.92619         osd.10     up  0.76031 1.0
17   ssd  0.92999         osd.17     up  0.82883 1.0
-6        2.78618     host n05
  6   ssd  0.92999         osd.6      up  0.84470 1.0
11   ssd  0.92619         osd.11     up  0.80530 1.0
18   ssd  0.92999         osd.18     up  0.86501 1.0
-7        2.78618     host n06
  1   ssd  0.92999         osd.1      up  0.88353 1.0
12   ssd  0.92619         osd.12     up  0.79602 1.0
19   ssd  0.92999         osd.19     up  0.83171 1.0
-8        2.78618     host n07
  0   ssd  0.92999         osd.0      up  1.0 1.0
13   ssd  0.92619         osd.13     up  0.86043 1.0
20   ssd  0.92999         osd.20     up  0.77153 1.0

Here you see osd.15 and osd.4 on the same host 'n02'.

This cluster was upgraded from Hammer to Jewel and now Luminous and
it doesn't have the latest tunables yet, but should that matter? I
never encountered this before.

tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

I don't want to touch this yet in the case this is a bug or glitch
in the matrix somewhere.

I hope it's just a admin mistake, but so far I'm not able to find a
clue pointing to that.

root@man:~# ceph osd dump|head -n 12
epoch 21545
fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
created 2015-04-28 14:43:53.950159
modified 2018-02-22 17:56:42.497849
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 22
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release luminous
root@man:~#

I also downloaded the CRUSHmap and ran crushtool with --test and
--show-mappings, but that didn't show any PG mapped to the same host.

Any ideas on what might be going on here?

Wido
___
ceph-users mailing list

Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

2018-02-22 Thread Mike Lovell
was the pg-upmap feature used to force a pg to get mapped to a particular
osd?

mike

On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollander  wrote:

> Hi,
>
> I have a situation with a cluster which was recently upgraded to Luminous
> and has a PG mapped to OSDs on the same host.
>
> root@man:~# ceph pg map 1.41
> osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
> root@man:~#
>
> root@man:~# ceph osd find 15|jq -r '.crush_location.host'
> n02
> root@man:~# ceph osd find 7|jq -r '.crush_location.host'
> n01
> root@man:~# ceph osd find 4|jq -r '.crush_location.host'
> n02
> root@man:~#
>
> As you can see, OSD 15 and 4 are both on the host 'n02'.
>
> This PG went inactive when the machine hosting both OSDs went down for
> maintenance.
>
> My first suspect was the CRUSHMap and the rules, but those are fine:
>
> rule replicated_ruleset {
> id 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> This is the only rule in the CRUSHMap.
>
> ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -1   19.50325 root default
> -22.78618 host n01
>  5   ssd  0.92999 osd.5  up  1.0 1.0
>  7   ssd  0.92619 osd.7  up  1.0 1.0
> 14   ssd  0.92999 osd.14 up  1.0 1.0
> -32.78618 host n02
>  4   ssd  0.92999 osd.4  up  1.0 1.0
>  8   ssd  0.92619 osd.8  up  1.0 1.0
> 15   ssd  0.92999 osd.15 up  1.0 1.0
> -42.78618 host n03
>  3   ssd  0.92999 osd.3  up  0.94577 1.0
>  9   ssd  0.92619 osd.9  up  0.82001 1.0
> 16   ssd  0.92999 osd.16 up  0.84885 1.0
> -52.78618 host n04
>  2   ssd  0.92999 osd.2  up  0.93501 1.0
> 10   ssd  0.92619 osd.10 up  0.76031 1.0
> 17   ssd  0.92999 osd.17 up  0.82883 1.0
> -62.78618 host n05
>  6   ssd  0.92999 osd.6  up  0.84470 1.0
> 11   ssd  0.92619 osd.11 up  0.80530 1.0
> 18   ssd  0.92999 osd.18 up  0.86501 1.0
> -72.78618 host n06
>  1   ssd  0.92999 osd.1  up  0.88353 1.0
> 12   ssd  0.92619 osd.12 up  0.79602 1.0
> 19   ssd  0.92999 osd.19 up  0.83171 1.0
> -82.78618 host n07
>  0   ssd  0.92999 osd.0  up  1.0 1.0
> 13   ssd  0.92619 osd.13 up  0.86043 1.0
> 20   ssd  0.92999 osd.20 up  0.77153 1.0
>
> Here you see osd.15 and osd.4 on the same host 'n02'.
>
> This cluster was upgraded from Hammer to Jewel and now Luminous and it
> doesn't have the latest tunables yet, but should that matter? I never
> encountered this before.
>
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable chooseleaf_stable 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> I don't want to touch this yet in the case this is a bug or glitch in the
> matrix somewhere.
>
> I hope it's just a admin mistake, but so far I'm not able to find a clue
> pointing to that.
>
> root@man:~# ceph osd dump|head -n 12
> epoch 21545
> fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
> created 2015-04-28 14:43:53.950159
> modified 2018-02-22 17:56:42.497849
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 22
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client luminous
> min_compat_client luminous
> require_osd_release luminous
> root@man:~#
>
> I also downloaded the CRUSHmap and ran crushtool with --test and
> --show-mappings, but that didn't show any PG mapped to the same host.
>
> Any ideas on what might be going on here?
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

2018-02-22 Thread Gregory Farnum
On Thu, Feb 22, 2018 at 9:29 AM Wido den Hollander  wrote:

> Hi,
>
> I have a situation with a cluster which was recently upgraded to
> Luminous and has a PG mapped to OSDs on the same host.
>
> root@man:~# ceph pg map 1.41
> osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
> root@man:~#
>
> root@man:~# ceph osd find 15|jq -r '.crush_location.host'
> n02
> root@man:~# ceph osd find 7|jq -r '.crush_location.host'
> n01
> root@man:~# ceph osd find 4|jq -r '.crush_location.host'
> n02
> root@man:~#
>
> As you can see, OSD 15 and 4 are both on the host 'n02'.
>
> This PG went inactive when the machine hosting both OSDs went down for
> maintenance.
>
> My first suspect was the CRUSHMap and the rules, but those are fine:
>
> rule replicated_ruleset {
> id 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> This is the only rule in the CRUSHMap.
>
> ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -1   19.50325 root default
> -22.78618 host n01
>   5   ssd  0.92999 osd.5  up  1.0 1.0
>   7   ssd  0.92619 osd.7  up  1.0 1.0
> 14   ssd  0.92999 osd.14 up  1.0 1.0
> -32.78618 host n02
>   4   ssd  0.92999 osd.4  up  1.0 1.0
>   8   ssd  0.92619 osd.8  up  1.0 1.0
> 15   ssd  0.92999 osd.15 up  1.0 1.0
> -42.78618 host n03
>   3   ssd  0.92999 osd.3  up  0.94577 1.0
>   9   ssd  0.92619 osd.9  up  0.82001 1.0
> 16   ssd  0.92999 osd.16 up  0.84885 1.0
> -52.78618 host n04
>   2   ssd  0.92999 osd.2  up  0.93501 1.0
> 10   ssd  0.92619 osd.10 up  0.76031 1.0
> 17   ssd  0.92999 osd.17 up  0.82883 1.0
> -62.78618 host n05
>   6   ssd  0.92999 osd.6  up  0.84470 1.0
> 11   ssd  0.92619 osd.11 up  0.80530 1.0
> 18   ssd  0.92999 osd.18 up  0.86501 1.0
> -72.78618 host n06
>   1   ssd  0.92999 osd.1  up  0.88353 1.0
> 12   ssd  0.92619 osd.12 up  0.79602 1.0
> 19   ssd  0.92999 osd.19 up  0.83171 1.0
> -82.78618 host n07
>   0   ssd  0.92999 osd.0  up  1.0 1.0
> 13   ssd  0.92619 osd.13 up  0.86043 1.0
> 20   ssd  0.92999 osd.20 up  0.77153 1.0
>
> Here you see osd.15 and osd.4 on the same host 'n02'.
>
> This cluster was upgraded from Hammer to Jewel and now Luminous and it
> doesn't have the latest tunables yet, but should that matter? I never
> encountered this before.
>
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable chooseleaf_stable 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> I don't want to touch this yet in the case this is a bug or glitch in
> the matrix somewhere.
>
> I hope it's just a admin mistake, but so far I'm not able to find a clue
> pointing to that.
>
> root@man:~# ceph osd dump|head -n 12
> epoch 21545
> fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
> created 2015-04-28 14:43:53.950159
> modified 2018-02-22 17:56:42.497849
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 22
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client luminous
> min_compat_client luminous
> require_osd_release luminous
> root@man:~#
>
> I also downloaded the CRUSHmap and ran crushtool with --test and
> --show-mappings, but that didn't show any PG mapped to the same host.
>

What *was* the mapping for the PG in question, then?

At a first guess, it sounds to me like CRUSH is failing to map the
appropriate number of participants on this PG, so one of the extant OSDs
from a prior epoch is getting drafted. I would expect this to show up as a
remapped PG.
-Greg


>
> Any ideas on what might be going on here?
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

2018-02-22 Thread Wido den Hollander

Hi,

I have a situation with a cluster which was recently upgraded to 
Luminous and has a PG mapped to OSDs on the same host.


root@man:~# ceph pg map 1.41
osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
root@man:~#

root@man:~# ceph osd find 15|jq -r '.crush_location.host'
n02
root@man:~# ceph osd find 7|jq -r '.crush_location.host'
n01
root@man:~# ceph osd find 4|jq -r '.crush_location.host'
n02
root@man:~#

As you can see, OSD 15 and 4 are both on the host 'n02'.

This PG went inactive when the machine hosting both OSDs went down for 
maintenance.


My first suspect was the CRUSHMap and the rules, but those are fine:

rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

This is the only rule in the CRUSHMap.

ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-1   19.50325 root default
-22.78618 host n01
 5   ssd  0.92999 osd.5  up  1.0 1.0
 7   ssd  0.92619 osd.7  up  1.0 1.0
14   ssd  0.92999 osd.14 up  1.0 1.0
-32.78618 host n02
 4   ssd  0.92999 osd.4  up  1.0 1.0
 8   ssd  0.92619 osd.8  up  1.0 1.0
15   ssd  0.92999 osd.15 up  1.0 1.0
-42.78618 host n03
 3   ssd  0.92999 osd.3  up  0.94577 1.0
 9   ssd  0.92619 osd.9  up  0.82001 1.0
16   ssd  0.92999 osd.16 up  0.84885 1.0
-52.78618 host n04
 2   ssd  0.92999 osd.2  up  0.93501 1.0
10   ssd  0.92619 osd.10 up  0.76031 1.0
17   ssd  0.92999 osd.17 up  0.82883 1.0
-62.78618 host n05
 6   ssd  0.92999 osd.6  up  0.84470 1.0
11   ssd  0.92619 osd.11 up  0.80530 1.0
18   ssd  0.92999 osd.18 up  0.86501 1.0
-72.78618 host n06
 1   ssd  0.92999 osd.1  up  0.88353 1.0
12   ssd  0.92619 osd.12 up  0.79602 1.0
19   ssd  0.92999 osd.19 up  0.83171 1.0
-82.78618 host n07
 0   ssd  0.92999 osd.0  up  1.0 1.0
13   ssd  0.92619 osd.13 up  0.86043 1.0
20   ssd  0.92999 osd.20 up  0.77153 1.0

Here you see osd.15 and osd.4 on the same host 'n02'.

This cluster was upgraded from Hammer to Jewel and now Luminous and it 
doesn't have the latest tunables yet, but should that matter? I never 
encountered this before.


tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

I don't want to touch this yet in the case this is a bug or glitch in 
the matrix somewhere.


I hope it's just a admin mistake, but so far I'm not able to find a clue 
pointing to that.


root@man:~# ceph osd dump|head -n 12
epoch 21545
fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
created 2015-04-28 14:43:53.950159
modified 2018-02-22 17:56:42.497849
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 22
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release luminous
root@man:~#

I also downloaded the CRUSHmap and ran crushtool with --test and 
--show-mappings, but that didn't show any PG mapped to the same host.


Any ideas on what might be going on here?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com