Re: [ceph-users] backfill_toofull after adding new OSDs

2019-03-06 Thread Paul Emmerich
Yeah, this happens all the time during backfilling since Mimic and is
some kind of bug.
It will always resolve itself, but it's still quite annoying.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Mar 6, 2019 at 3:33 PM Simon Ironside  wrote:
>
> I've just seen this when *removing* an OSD too.
> Issue resolved itself during recovery. OSDs were not full, not even
> close, there's virtually nothing on this cluster.
> Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs.
> Everything is otherwise default.
>
>cluster:
>  id: MY ID
>  health: HEALTH_ERR
>  1161/66039 objects misplaced (1.758%)
>  Degraded data redundancy: 220095/66039 objects degraded
> (333.280%), 137 pgs degraded
>  Degraded data redundancy (low space): 1 pg backfill_toofull
>
>services:
>  mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
>  mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
>  osd: 53 osds: 52 up, 52 in; 186 remapped pgs
>
>data:
>  pools:   16 pools, 2016 pgs
>  objects: 22.01 k objects, 83 GiB
>  usage:   7.9 TiB used, 473 TiB / 481 TiB avail
>  pgs: 220095/66039 objects degraded (333.280%)
>   1161/66039 objects misplaced (1.758%)
>   1830 active+clean
>   134  active+recovery_wait+undersized+degraded+remapped
>   45   active+remapped+backfill_wait
>   3active+recovering+undersized+remapped
>   3active+recovery_wait+undersized+degraded
>   1active+remapped+backfill_wait+backfill_toofull
>
>io:
>  client:   60 KiB/s wr, 0 op/s rd, 5 op/s wr
>  recovery: 8.6 MiB/s, 110 objects/s
>
>
> On 07/02/2019 04:26, Brad Hubbard wrote:
> > Let's try to restrict discussion to the original thread
> > "backfill_toofull while OSDs are not full" and get a tracker opened up
> > for this issue.
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-03-06 Thread Simon Ironside

I've just seen this when *removing* an OSD too.
Issue resolved itself during recovery. OSDs were not full, not even 
close, there's virtually nothing on this cluster.
Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs. 
Everything is otherwise default.


  cluster:
    id: MY ID
    health: HEALTH_ERR
    1161/66039 objects misplaced (1.758%)
    Degraded data redundancy: 220095/66039 objects degraded 
(333.280%), 137 pgs degraded

    Degraded data redundancy (low space): 1 pg backfill_toofull

  services:
    mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
    mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
    osd: 53 osds: 52 up, 52 in; 186 remapped pgs

  data:
    pools:   16 pools, 2016 pgs
    objects: 22.01 k objects, 83 GiB
    usage:   7.9 TiB used, 473 TiB / 481 TiB avail
    pgs: 220095/66039 objects degraded (333.280%)
 1161/66039 objects misplaced (1.758%)
 1830 active+clean
 134  active+recovery_wait+undersized+degraded+remapped
 45   active+remapped+backfill_wait
 3    active+recovering+undersized+remapped
 3    active+recovery_wait+undersized+degraded
 1    active+remapped+backfill_wait+backfill_toofull

  io:
    client:   60 KiB/s wr, 0 op/s rd, 5 op/s wr
    recovery: 8.6 MiB/s, 110 objects/s


On 07/02/2019 04:26, Brad Hubbard wrote:

Let's try to restrict discussion to the original thread
"backfill_toofull while OSDs are not full" and get a tracker opened up
for this issue.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-02-06 Thread Brad Hubbard
Let's try to restrict discussion to the original thread
"backfill_toofull while OSDs are not full" and get a tracker opened up
for this issue.

On Sat, Feb 2, 2019 at 11:52 AM Fyodor Ustinov  wrote:
>
> Hi!
>
> Right now, after adding OSD:
>
> # ceph health detail
> HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data 
> redundancy (low space): 1 pg backfill_toofull
> OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%)
> PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull
> pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting 
> [21,0,47]
>
> # ceph pg ls-by-pool iscsi backfill_toofull
> PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES  LOG  STATE 
>  STATE_STAMPVERSION   REPORTED   UP   
>   ACTING   SCRUB_STAMPDEEP_SCRUB_STAMP
> 6.eb 6450  1290   0 1645654016 3067 
> active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 
> 7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 
> 2019-01-18 04:13:54.280495
>
> All OSD have less 40% USE.
>
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 9.56149  1.0 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313
>  1   hdd 9.56149  1.0 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295
>  5   hdd 9.56149  1.0 9.6 TiB 756 GiB 8.8 TiB  7.72 0.30 103
> 47   hdd 9.32390  1.0 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306
>
> (all other OSD also have less 40%)
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
>
> Maybe the developers will pay attention to the letter and say something?
>
> - Original Message -
> From: "Fyodor Ustinov" 
> To: "Caspar Smit" 
> Cc: "Jan Kasprzak" , "ceph-users" 
> Sent: Thursday, 31 January, 2019 16:50:24
> Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
>
> Hi!
>
> I saw the same several times when I added a new osd to the cluster. One-two 
> pg in "backfill_toofull" state.
>
> In all versions of mimic.
>
> - Original Message -
> From: "Caspar Smit" 
> To: "Jan Kasprzak" 
> Cc: "ceph-users" 
> Sent: Thursday, 31 January, 2019 15:43:07
> Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
>
> Hi Jan,
>
> You might be hitting the same issue as Wido here:
>
> [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
> https://www.spinics.net/lists/ceph-users/msg50603.html ]
>
> Kind regards,
> Caspar
>
> Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
> k...@fi.muni.cz ] >:
>
>
> Hello, ceph users,
>
> I see the following HEALTH_ERR during cluster rebalance:
>
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
> Detailed description:
> I have upgraded my cluster to mimic and added 16 new bluestore OSDs
> on 4 hosts. The hosts are in a separate region in my crush map, and crush
> rules prevented data to be moved on the new OSDs. Now I want to move
> all data to the new OSDs (and possibly decomission the old filestore OSDs).
> I have created the following rule:
>
> # ceph osd crush rule create-replicated on-newhosts newhostsroot host
>
> after this, I am slowly moving the pools one-by-one to this new rule:
>
> # ceph osd pool set test-hdd-pool crush_rule on-newhosts
>
> When I do this, I get the above error. This is misleading, because
> ceph osd df does not suggest the OSDs are getting full (the most full
> OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR
> disappears. Why am I getting this error?
>
> # ceph -s
> cluster:
> id: ...my UUID...
> health: HEALTH_ERR
> 1271/3803223 objects misplaced (0.033%)
> Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
> degraded, 67 pgs undersized
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
> services:
> mon: 3 daemons, quorum mon1,mon2,mon3
> mgr: mon2(active), standbys: mon1, mon3
> osd: 80 osds: 80 up, 80 in; 90 remapped pgs
> rgw: 1 daemon active
>
> data:
> pools: 13 pools, 5056 pgs
> objects: 1.27 M objects, 4.8 TiB
> usage: 15 TiB used, 208 TiB / 224 TiB avail
> pgs: 40124/3803223 objects degraded (1.055%)
> 1271/3803223 objects misplaced (0.033%)
> 4963 active+clean
> 41 active+recovery_wait+undersized+degraded+remapped
> 21 active+recovery_wait+undersized+degraded
> 17 active+remapped+backfill_wait
> 5 active+remapped+backfill_wait+backfill_toofull
> 3 active+remapped+backfill_toofull
> 2 active+recovering+undersized+remapped
> 2 ac

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-02-01 Thread Fyodor Ustinov
Hi!

Right now, after adding OSD:

# ceph health detail
HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data 
redundancy (low space): 1 pg backfill_toofull
OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%)
PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull
pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting [21,0,47]

# ceph pg ls-by-pool iscsi backfill_toofull
PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES  LOG  STATE   
   STATE_STAMPVERSION   REPORTED   UP   
  ACTING   SCRUB_STAMPDEEP_SCRUB_STAMP
6.eb 6450  1290   0 1645654016 3067 
active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 
7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 
2019-01-18 04:13:54.280495

All OSD have less 40% USE.

ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 9.56149  1.0 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313
 1   hdd 9.56149  1.0 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295
 5   hdd 9.56149  1.0 9.6 TiB 756 GiB 8.8 TiB  7.72 0.30 103
47   hdd 9.32390  1.0 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306

(all other OSD also have less 40%)

ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

Maybe the developers will pay attention to the letter and say something?

- Original Message -
From: "Fyodor Ustinov" 
To: "Caspar Smit" 
Cc: "Jan Kasprzak" , "ceph-users" 
Sent: Thursday, 31 January, 2019 16:50:24
Subject: Re: [ceph-users] backfill_toofull after adding new OSDs

Hi!

I saw the same several times when I added a new osd to the cluster. One-two pg 
in "backfill_toofull" state.

In all versions of mimic.

- Original Message -
From: "Caspar Smit" 
To: "Jan Kasprzak" 
Cc: "ceph-users" 
Sent: Thursday, 31 January, 2019 15:43:07
Subject: Re: [ceph-users] backfill_toofull after adding new OSDs

Hi Jan, 

You might be hitting the same issue as Wido here: 

[ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 

Kind regards, 
Caspar 

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
k...@fi.muni.cz ] >: 


Hello, ceph users, 

I see the following HEALTH_ERR during cluster rebalance: 

Degraded data redundancy (low space): 8 pgs backfill_toofull 

Detailed description: 
I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
on 4 hosts. The hosts are in a separate region in my crush map, and crush 
rules prevented data to be moved on the new OSDs. Now I want to move 
all data to the new OSDs (and possibly decomission the old filestore OSDs). 
I have created the following rule: 

# ceph osd crush rule create-replicated on-newhosts newhostsroot host 

after this, I am slowly moving the pools one-by-one to this new rule: 

# ceph osd pool set test-hdd-pool crush_rule on-newhosts 

When I do this, I get the above error. This is misleading, because 
ceph osd df does not suggest the OSDs are getting full (the most full 
OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
disappears. Why am I getting this error? 

# ceph -s 
cluster: 
id: ...my UUID... 
health: HEALTH_ERR 
1271/3803223 objects misplaced (0.033%) 
Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
Degraded data redundancy (low space): 8 pgs backfill_toofull 

services: 
mon: 3 daemons, quorum mon1,mon2,mon3 
mgr: mon2(active), standbys: mon1, mon3 
osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
rgw: 1 daemon active 

data: 
pools: 13 pools, 5056 pgs 
objects: 1.27 M objects, 4.8 TiB 
usage: 15 TiB used, 208 TiB / 224 TiB avail 
pgs: 40124/3803223 objects degraded (1.055%) 
1271/3803223 objects misplaced (0.033%) 
4963 active+clean 
41 active+recovery_wait+undersized+degraded+remapped 
21 active+recovery_wait+undersized+degraded 
17 active+remapped+backfill_wait 
5 active+remapped+backfill_wait+backfill_toofull 
3 active+remapped+backfill_toofull 
2 active+recovering+undersized+remapped 
2 active+recovering+undersized+degraded+remapped 
1 active+clean+remapped 
1 active+recovering+undersized+degraded 

io: 
client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
recovery: 2.0 MiB/s, 92 objects/s 

Thanks for any hint, 

-Yenya 

-- 
| Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ 
http://yenya.net/ | yenya.net ] - private}> | 
| [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
This is the world we live in: the way to deal with computers is to google 
the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.co

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
OKay, now I changed the crush rule also on a pool with
the real data, and it seems all the client i/o on that pool has stopped.
The recovery continues, but things like qemu I/O, "rbd ls", and so on
are just stuck doing nothing.

Can I unstuck it somehow (faster than waiting for all the recovery
to finish)? Thanks.

# ceph -s
  cluster:
id: ... my-uuid ...
health: HEALTH_ERR
3308311/3803892 objects misplaced (86.972%)
Reduced data availability: 1721 pgs inactive
Degraded data redundancy: 85361/3803892 objects degraded (2.244%), 1
39 pgs degraded, 139 pgs undersized
Degraded data redundancy (low space): 25 pgs backfill_toofull

  services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1, mon3
osd: 80 osds: 80 up, 80 in; 1868 remapped pgs
rgw: 1 daemon active

  data:
pools:   13 pools, 5056 pgs
objects: 1.27 M objects, 4.8 TiB
usage:   15 TiB used, 208 TiB / 224 TiB avail
pgs: 34.039% pgs not active
 85361/3803892 objects degraded (2.244%)
 3308311/3803892 objects misplaced (86.972%)
 3188 active+clean
 1582 activating+remapped
 139  activating+undersized+degraded+remapped
 93   active+remapped+backfill_wait
 29   active+remapped+backfilling
 25   active+remapped+backfill_wait+backfill_toofull

  io:
recovery: 174 MiB/s, 43 objects/s


-Yenya


Jan Kasprzak wrote:
: : - Original Message -
: : From: "Caspar Smit" 
: : To: "Jan Kasprzak" 
: : Cc: "ceph-users" 
: : Sent: Thursday, 31 January, 2019 15:43:07
: : Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
: : 
: : Hi Jan, 
: : 
: : You might be hitting the same issue as Wido here: 
: : 
: : [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: : 
: : Kind regards, 
: : Caspar 
: : 
: : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz 
| k...@fi.muni.cz ] >: 
: : 
: : 
: : Hello, ceph users, 
: : 
: : I see the following HEALTH_ERR during cluster rebalance: 
: : 
: : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : 
: : Detailed description: 
: : I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: : on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: : rules prevented data to be moved on the new OSDs. Now I want to move 
: : all data to the new OSDs (and possibly decomission the old filestore OSDs). 
: : I have created the following rule: 
: : 
: : # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: : 
: : after this, I am slowly moving the pools one-by-one to this new rule: 
: : 
: : # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: : 
: : When I do this, I get the above error. This is misleading, because 
: : ceph osd df does not suggest the OSDs are getting full (the most full 
: : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: : disappears. Why am I getting this error? 
: : 
: : # ceph -s 
: : cluster: 
: : id: ...my UUID... 
: : health: HEALTH_ERR 
: : 1271/3803223 objects misplaced (0.033%) 
: : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
: : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : 
: : services: 
: : mon: 3 daemons, quorum mon1,mon2,mon3 
: : mgr: mon2(active), standbys: mon1, mon3 
: : osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: : rgw: 1 daemon active 
: : 
: : data: 
: : pools: 13 pools, 5056 pgs 
: : objects: 1.27 M objects, 4.8 TiB 
: : usage: 15 TiB used, 208 TiB / 224 TiB avail 
: : pgs: 40124/3803223 objects degraded (1.055%) 
: : 1271/3803223 objects misplaced (0.033%) 
: : 4963 active+clean 
: : 41 active+recovery_wait+undersized+degraded+remapped 
: : 21 active+recovery_wait+undersized+degraded 
: : 17 active+remapped+backfill_wait 
: : 5 active+remapped+backfill_wait+backfill_toofull 
: : 3 active+remapped+backfill_toofull 
: : 2 active+recovering+undersized+remapped 
: : 2 active+recovering+undersized+degraded+remapped 
: : 1 active+clean+remapped 
: : 1 active+recovering+undersized+degraded 
: : 
: : io: 
: : client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
: : recovery: 2.0 MiB/s, 92 objects/s 
: : 
: : Thanks for any hint, 
: : 
: : -Yenya 
: : 
: : -- 
: : | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work 
| [ http://yenya.net/ | yenya.net ] - private}> | 
: : | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
: : This is the world we live in: the way to deal with computers is to google 
: : the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
: : ___ 
: : ceph-users mailing list 
: : [ mailto:ceph-use

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
Fyodor Ustinov wrote:
: Hi!
: 
: I saw the same several times when I added a new osd to the cluster. One-two 
pg in "backfill_toofull" state.
: 
: In all versions of mimic.

Yep. In my case it is not (only) after adding the new OSDs.
An hour or so ago my cluster reached the HEALTH_OK state, so I moved
another pool to the new hosts with "crush_rule on-newhosts". The result
was immediate backfill_toofull on two PGs for about five minutes,
and then it reached the HEALTH_OK again.

So the PGs are not stuck in that state forever, they are there
only during the data reshuffle.

13.2.4 on CentOS 7.

-Yenya

: 
: - Original Message -
: From: "Caspar Smit" 
: To: "Jan Kasprzak" 
: Cc: "ceph-users" 
: Sent: Thursday, 31 January, 2019 15:43:07
: Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
: 
: Hi Jan, 
: 
: You might be hitting the same issue as Wido here: 
: 
: [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: 
: Kind regards, 
: Caspar 
: 
: Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
k...@fi.muni.cz ] >: 
: 
: 
: Hello, ceph users, 
: 
: I see the following HEALTH_ERR during cluster rebalance: 
: 
: Degraded data redundancy (low space): 8 pgs backfill_toofull 
: 
: Detailed description: 
: I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: rules prevented data to be moved on the new OSDs. Now I want to move 
: all data to the new OSDs (and possibly decomission the old filestore OSDs). 
: I have created the following rule: 
: 
: # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: 
: after this, I am slowly moving the pools one-by-one to this new rule: 
: 
: # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: 
: When I do this, I get the above error. This is misleading, because 
: ceph osd df does not suggest the OSDs are getting full (the most full 
: OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: disappears. Why am I getting this error? 
: 
: # ceph -s 
: cluster: 
: id: ...my UUID... 
: health: HEALTH_ERR 
: 1271/3803223 objects misplaced (0.033%) 
: Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
: Degraded data redundancy (low space): 8 pgs backfill_toofull 
: 
: services: 
: mon: 3 daemons, quorum mon1,mon2,mon3 
: mgr: mon2(active), standbys: mon1, mon3 
: osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: rgw: 1 daemon active 
: 
: data: 
: pools: 13 pools, 5056 pgs 
: objects: 1.27 M objects, 4.8 TiB 
: usage: 15 TiB used, 208 TiB / 224 TiB avail 
: pgs: 40124/3803223 objects degraded (1.055%) 
: 1271/3803223 objects misplaced (0.033%) 
: 4963 active+clean 
: 41 active+recovery_wait+undersized+degraded+remapped 
: 21 active+recovery_wait+undersized+degraded 
: 17 active+remapped+backfill_wait 
: 5 active+remapped+backfill_wait+backfill_toofull 
: 3 active+remapped+backfill_toofull 
: 2 active+recovering+undersized+remapped 
: 2 active+recovering+undersized+degraded+remapped 
: 1 active+clean+remapped 
: 1 active+recovering+undersized+degraded 
: 
: io: 
: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
: recovery: 2.0 MiB/s, 92 objects/s 
: 
: Thanks for any hint, 
: 
: -Yenya 
: 
: -- 
: | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | 
[ http://yenya.net/ | yenya.net ] - private}> | 
: | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
: This is the world we live in: the way to deal with computers is to google 
: the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
: ___ 
: ceph-users mailing list 
: [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
: [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 
: 
: ___
: ceph-users mailing list
: ceph-users@lists.ceph.com
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Fyodor Ustinov
Hi!

I saw the same several times when I added a new osd to the cluster. One-two pg 
in "backfill_toofull" state.

In all versions of mimic.

- Original Message -
From: "Caspar Smit" 
To: "Jan Kasprzak" 
Cc: "ceph-users" 
Sent: Thursday, 31 January, 2019 15:43:07
Subject: Re: [ceph-users] backfill_toofull after adding new OSDs

Hi Jan, 

You might be hitting the same issue as Wido here: 

[ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 

Kind regards, 
Caspar 

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
k...@fi.muni.cz ] >: 


Hello, ceph users, 

I see the following HEALTH_ERR during cluster rebalance: 

Degraded data redundancy (low space): 8 pgs backfill_toofull 

Detailed description: 
I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
on 4 hosts. The hosts are in a separate region in my crush map, and crush 
rules prevented data to be moved on the new OSDs. Now I want to move 
all data to the new OSDs (and possibly decomission the old filestore OSDs). 
I have created the following rule: 

# ceph osd crush rule create-replicated on-newhosts newhostsroot host 

after this, I am slowly moving the pools one-by-one to this new rule: 

# ceph osd pool set test-hdd-pool crush_rule on-newhosts 

When I do this, I get the above error. This is misleading, because 
ceph osd df does not suggest the OSDs are getting full (the most full 
OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
disappears. Why am I getting this error? 

# ceph -s 
cluster: 
id: ...my UUID... 
health: HEALTH_ERR 
1271/3803223 objects misplaced (0.033%) 
Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
Degraded data redundancy (low space): 8 pgs backfill_toofull 

services: 
mon: 3 daemons, quorum mon1,mon2,mon3 
mgr: mon2(active), standbys: mon1, mon3 
osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
rgw: 1 daemon active 

data: 
pools: 13 pools, 5056 pgs 
objects: 1.27 M objects, 4.8 TiB 
usage: 15 TiB used, 208 TiB / 224 TiB avail 
pgs: 40124/3803223 objects degraded (1.055%) 
1271/3803223 objects misplaced (0.033%) 
4963 active+clean 
41 active+recovery_wait+undersized+degraded+remapped 
21 active+recovery_wait+undersized+degraded 
17 active+remapped+backfill_wait 
5 active+remapped+backfill_wait+backfill_toofull 
3 active+remapped+backfill_toofull 
2 active+recovering+undersized+remapped 
2 active+recovering+undersized+degraded+remapped 
1 active+clean+remapped 
1 active+recovering+undersized+degraded 

io: 
client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
recovery: 2.0 MiB/s, 92 objects/s 

Thanks for any hint, 

-Yenya 

-- 
| Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ 
http://yenya.net/ | yenya.net ] - private}> | 
| [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
This is the world we live in: the way to deal with computers is to google 
the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Caspar Smit
Hi Jan,

You might be hitting the same issue as Wido here:

https://www.spinics.net/lists/ceph-users/msg50603.html

Kind regards,
Caspar

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak :

> Hello, ceph users,
>
> I see the following HEALTH_ERR during cluster rebalance:
>
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
> Detailed description:
> I have upgraded my cluster to mimic and added 16 new bluestore OSDs
> on 4 hosts. The hosts are in a separate region in my crush map, and crush
> rules prevented data to be moved on the new OSDs. Now I want to move
> all data to the new OSDs (and possibly decomission the old filestore OSDs).
> I have created the following rule:
>
> # ceph osd crush rule create-replicated on-newhosts newhostsroot host
>
> after this, I am slowly moving the pools one-by-one to this new rule:
>
> # ceph osd pool set test-hdd-pool crush_rule on-newhosts
>
> When I do this, I get the above error. This is misleading, because
> ceph osd df does not suggest the OSDs are getting full (the most full
> OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR
> disappears. Why am I getting this error?
>
> # ceph -s
>   cluster:
> id: ...my UUID...
> health: HEALTH_ERR
> 1271/3803223 objects misplaced (0.033%)
> Degraded data redundancy: 40124/3803223 objects degraded
> (1.055%), 65 pgs degraded, 67 pgs undersized
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
>   services:
> mon: 3 daemons, quorum mon1,mon2,mon3
> mgr: mon2(active), standbys: mon1, mon3
> osd: 80 osds: 80 up, 80 in; 90 remapped pgs
> rgw: 1 daemon active
>
>   data:
> pools:   13 pools, 5056 pgs
> objects: 1.27 M objects, 4.8 TiB
> usage:   15 TiB used, 208 TiB / 224 TiB avail
> pgs: 40124/3803223 objects degraded (1.055%)
>  1271/3803223 objects misplaced (0.033%)
>  4963 active+clean
>  41   active+recovery_wait+undersized+degraded+remapped
>  21   active+recovery_wait+undersized+degraded
>  17   active+remapped+backfill_wait
>  5active+remapped+backfill_wait+backfill_toofull
>  3active+remapped+backfill_toofull
>  2active+recovering+undersized+remapped
>  2active+recovering+undersized+degraded+remapped
>  1active+clean+remapped
>  1active+recovering+undersized+degraded
>
>   io:
> client:   6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr
> recovery: 2.0 MiB/s, 92 objects/s
>
> Thanks for any hint,
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak 
> |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> |
>  This is the world we live in: the way to deal with computers is to google
>  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
Hello, ceph users,

I see the following HEALTH_ERR during cluster rebalance:

Degraded data redundancy (low space): 8 pgs backfill_toofull

Detailed description:
I have upgraded my cluster to mimic and added 16 new bluestore OSDs
on 4 hosts. The hosts are in a separate region in my crush map, and crush
rules prevented data to be moved on the new OSDs. Now I want to move
all data to the new OSDs (and possibly decomission the old filestore OSDs).
I have created the following rule:

# ceph osd crush rule create-replicated on-newhosts newhostsroot host

after this, I am slowly moving the pools one-by-one to this new rule:

# ceph osd pool set test-hdd-pool crush_rule on-newhosts

When I do this, I get the above error. This is misleading, because
ceph osd df does not suggest the OSDs are getting full (the most full
OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR
disappears. Why am I getting this error?

# ceph -s
  cluster:
id: ...my UUID...
health: HEALTH_ERR
1271/3803223 objects misplaced (0.033%)
Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 
65 pgs degraded, 67 pgs undersized
Degraded data redundancy (low space): 8 pgs backfill_toofull
 
  services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1, mon3
osd: 80 osds: 80 up, 80 in; 90 remapped pgs
rgw: 1 daemon active
 
  data:
pools:   13 pools, 5056 pgs
objects: 1.27 M objects, 4.8 TiB
usage:   15 TiB used, 208 TiB / 224 TiB avail
pgs: 40124/3803223 objects degraded (1.055%)
 1271/3803223 objects misplaced (0.033%)
 4963 active+clean
 41   active+recovery_wait+undersized+degraded+remapped
 21   active+recovery_wait+undersized+degraded
 17   active+remapped+backfill_wait
 5active+remapped+backfill_wait+backfill_toofull
 3active+remapped+backfill_toofull
 2active+recovering+undersized+remapped
 2active+recovering+undersized+degraded+remapped
 1active+clean+remapped
 1active+recovering+undersized+degraded
 
  io:
client:   6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr
recovery: 2.0 MiB/s, 92 objects/s
 
Thanks for any hint,

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com