Re: [ceph-users] backfill_toofull after adding new OSDs
Yeah, this happens all the time during backfilling since Mimic and is some kind of bug. It will always resolve itself, but it's still quite annoying. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Mar 6, 2019 at 3:33 PM Simon Ironside wrote: > > I've just seen this when *removing* an OSD too. > Issue resolved itself during recovery. OSDs were not full, not even > close, there's virtually nothing on this cluster. > Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs. > Everything is otherwise default. > >cluster: > id: MY ID > health: HEALTH_ERR > 1161/66039 objects misplaced (1.758%) > Degraded data redundancy: 220095/66039 objects degraded > (333.280%), 137 pgs degraded > Degraded data redundancy (low space): 1 pg backfill_toofull > >services: > mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3 > mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3 > osd: 53 osds: 52 up, 52 in; 186 remapped pgs > >data: > pools: 16 pools, 2016 pgs > objects: 22.01 k objects, 83 GiB > usage: 7.9 TiB used, 473 TiB / 481 TiB avail > pgs: 220095/66039 objects degraded (333.280%) > 1161/66039 objects misplaced (1.758%) > 1830 active+clean > 134 active+recovery_wait+undersized+degraded+remapped > 45 active+remapped+backfill_wait > 3active+recovering+undersized+remapped > 3active+recovery_wait+undersized+degraded > 1active+remapped+backfill_wait+backfill_toofull > >io: > client: 60 KiB/s wr, 0 op/s rd, 5 op/s wr > recovery: 8.6 MiB/s, 110 objects/s > > > On 07/02/2019 04:26, Brad Hubbard wrote: > > Let's try to restrict discussion to the original thread > > "backfill_toofull while OSDs are not full" and get a tracker opened up > > for this issue. > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] backfill_toofull after adding new OSDs
I've just seen this when *removing* an OSD too. Issue resolved itself during recovery. OSDs were not full, not even close, there's virtually nothing on this cluster. Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs. Everything is otherwise default. cluster: id: MY ID health: HEALTH_ERR 1161/66039 objects misplaced (1.758%) Degraded data redundancy: 220095/66039 objects degraded (333.280%), 137 pgs degraded Degraded data redundancy (low space): 1 pg backfill_toofull services: mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3 mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3 osd: 53 osds: 52 up, 52 in; 186 remapped pgs data: pools: 16 pools, 2016 pgs objects: 22.01 k objects, 83 GiB usage: 7.9 TiB used, 473 TiB / 481 TiB avail pgs: 220095/66039 objects degraded (333.280%) 1161/66039 objects misplaced (1.758%) 1830 active+clean 134 active+recovery_wait+undersized+degraded+remapped 45 active+remapped+backfill_wait 3 active+recovering+undersized+remapped 3 active+recovery_wait+undersized+degraded 1 active+remapped+backfill_wait+backfill_toofull io: client: 60 KiB/s wr, 0 op/s rd, 5 op/s wr recovery: 8.6 MiB/s, 110 objects/s On 07/02/2019 04:26, Brad Hubbard wrote: Let's try to restrict discussion to the original thread "backfill_toofull while OSDs are not full" and get a tracker opened up for this issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] backfill_toofull after adding new OSDs
Let's try to restrict discussion to the original thread "backfill_toofull while OSDs are not full" and get a tracker opened up for this issue. On Sat, Feb 2, 2019 at 11:52 AM Fyodor Ustinov wrote: > > Hi! > > Right now, after adding OSD: > > # ceph health detail > HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data > redundancy (low space): 1 pg backfill_toofull > OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%) > PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull > pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting > [21,0,47] > > # ceph pg ls-by-pool iscsi backfill_toofull > PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE > STATE_STAMPVERSION REPORTED UP > ACTING SCRUB_STAMPDEEP_SCRUB_STAMP > 6.eb 6450 1290 0 1645654016 3067 > active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 > 7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 > 2019-01-18 04:13:54.280495 > > All OSD have less 40% USE. > > ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS > 0 hdd 9.56149 1.0 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313 > 1 hdd 9.56149 1.0 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295 > 5 hdd 9.56149 1.0 9.6 TiB 756 GiB 8.8 TiB 7.72 0.30 103 > 47 hdd 9.32390 1.0 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306 > > (all other OSD also have less 40%) > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > Maybe the developers will pay attention to the letter and say something? > > - Original Message - > From: "Fyodor Ustinov" > To: "Caspar Smit" > Cc: "Jan Kasprzak" , "ceph-users" > Sent: Thursday, 31 January, 2019 16:50:24 > Subject: Re: [ceph-users] backfill_toofull after adding new OSDs > > Hi! > > I saw the same several times when I added a new osd to the cluster. One-two > pg in "backfill_toofull" state. > > In all versions of mimic. > > - Original Message - > From: "Caspar Smit" > To: "Jan Kasprzak" > Cc: "ceph-users" > Sent: Thursday, 31 January, 2019 15:43:07 > Subject: Re: [ceph-users] backfill_toofull after adding new OSDs > > Hi Jan, > > You might be hitting the same issue as Wido here: > > [ https://www.spinics.net/lists/ceph-users/msg50603.html | > https://www.spinics.net/lists/ceph-users/msg50603.html ] > > Kind regards, > Caspar > > Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | > k...@fi.muni.cz ] >: > > > Hello, ceph users, > > I see the following HEALTH_ERR during cluster rebalance: > > Degraded data redundancy (low space): 8 pgs backfill_toofull > > Detailed description: > I have upgraded my cluster to mimic and added 16 new bluestore OSDs > on 4 hosts. The hosts are in a separate region in my crush map, and crush > rules prevented data to be moved on the new OSDs. Now I want to move > all data to the new OSDs (and possibly decomission the old filestore OSDs). > I have created the following rule: > > # ceph osd crush rule create-replicated on-newhosts newhostsroot host > > after this, I am slowly moving the pools one-by-one to this new rule: > > # ceph osd pool set test-hdd-pool crush_rule on-newhosts > > When I do this, I get the above error. This is misleading, because > ceph osd df does not suggest the OSDs are getting full (the most full > OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR > disappears. Why am I getting this error? > > # ceph -s > cluster: > id: ...my UUID... > health: HEALTH_ERR > 1271/3803223 objects misplaced (0.033%) > Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs > degraded, 67 pgs undersized > Degraded data redundancy (low space): 8 pgs backfill_toofull > > services: > mon: 3 daemons, quorum mon1,mon2,mon3 > mgr: mon2(active), standbys: mon1, mon3 > osd: 80 osds: 80 up, 80 in; 90 remapped pgs > rgw: 1 daemon active > > data: > pools: 13 pools, 5056 pgs > objects: 1.27 M objects, 4.8 TiB > usage: 15 TiB used, 208 TiB / 224 TiB avail > pgs: 40124/3803223 objects degraded (1.055%) > 1271/3803223 objects misplaced (0.033%) > 4963 active+clean > 41 active+recovery_wait+undersized+degraded+remapped > 21 active+recovery_wait+undersized+degraded > 17 active+remapped+backfill_wait > 5 active+remapped+backfill_wait+backfill_toofull > 3 active+remapped+backfill_toofull > 2 active+recovering+undersized+remapped > 2 ac
Re: [ceph-users] backfill_toofull after adding new OSDs
Hi! Right now, after adding OSD: # ceph health detail HEALTH_ERR 74197563/199392333 objects misplaced (37.212%); Degraded data redundancy (low space): 1 pg backfill_toofull OBJECT_MISPLACED 74197563/199392333 objects misplaced (37.212%) PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull pg 6.eb is active+remapped+backfill_wait+backfill_toofull, acting [21,0,47] # ceph pg ls-by-pool iscsi backfill_toofull PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE STATE_STAMPVERSION REPORTED UP ACTING SCRUB_STAMPDEEP_SCRUB_STAMP 6.eb 6450 1290 0 1645654016 3067 active+remapped+backfill_wait+backfill_toofull 2019-02-02 00:20:32.975300 7208'6567 9790:16214 [5,1,21]p5 [21,0,47]p21 2019-01-18 04:13:54.280495 2019-01-18 04:13:54.280495 All OSD have less 40% USE. ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 hdd 9.56149 1.0 9.6 TiB 3.2 TiB 6.3 TiB 33.64 1.31 313 1 hdd 9.56149 1.0 9.6 TiB 3.3 TiB 6.3 TiB 34.13 1.33 295 5 hdd 9.56149 1.0 9.6 TiB 756 GiB 8.8 TiB 7.72 0.30 103 47 hdd 9.32390 1.0 9.3 TiB 3.1 TiB 6.2 TiB 33.75 1.31 306 (all other OSD also have less 40%) ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) Maybe the developers will pay attention to the letter and say something? - Original Message - From: "Fyodor Ustinov" To: "Caspar Smit" Cc: "Jan Kasprzak" , "ceph-users" Sent: Thursday, 31 January, 2019 16:50:24 Subject: Re: [ceph-users] backfill_toofull after adding new OSDs Hi! I saw the same several times when I added a new osd to the cluster. One-two pg in "backfill_toofull" state. In all versions of mimic. - Original Message - From: "Caspar Smit" To: "Jan Kasprzak" Cc: "ceph-users" Sent: Thursday, 31 January, 2019 15:43:07 Subject: Re: [ceph-users] backfill_toofull after adding new OSDs Hi Jan, You might be hitting the same issue as Wido here: [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] Kind regards, Caspar Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | k...@fi.muni.cz ] >: Hello, ceph users, I see the following HEALTH_ERR during cluster rebalance: Degraded data redundancy (low space): 8 pgs backfill_toofull Detailed description: I have upgraded my cluster to mimic and added 16 new bluestore OSDs on 4 hosts. The hosts are in a separate region in my crush map, and crush rules prevented data to be moved on the new OSDs. Now I want to move all data to the new OSDs (and possibly decomission the old filestore OSDs). I have created the following rule: # ceph osd crush rule create-replicated on-newhosts newhostsroot host after this, I am slowly moving the pools one-by-one to this new rule: # ceph osd pool set test-hdd-pool crush_rule on-newhosts When I do this, I get the above error. This is misleading, because ceph osd df does not suggest the OSDs are getting full (the most full OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR disappears. Why am I getting this error? # ceph -s cluster: id: ...my UUID... health: HEALTH_ERR 1271/3803223 objects misplaced (0.033%) Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized Degraded data redundancy (low space): 8 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1, mon3 osd: 80 osds: 80 up, 80 in; 90 remapped pgs rgw: 1 daemon active data: pools: 13 pools, 5056 pgs objects: 1.27 M objects, 4.8 TiB usage: 15 TiB used, 208 TiB / 224 TiB avail pgs: 40124/3803223 objects degraded (1.055%) 1271/3803223 objects misplaced (0.033%) 4963 active+clean 41 active+recovery_wait+undersized+degraded+remapped 21 active+recovery_wait+undersized+degraded 17 active+remapped+backfill_wait 5 active+remapped+backfill_wait+backfill_toofull 3 active+remapped+backfill_toofull 2 active+recovering+undersized+remapped 2 active+recovering+undersized+degraded+remapped 1 active+clean+remapped 1 active+recovering+undersized+degraded io: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr recovery: 2.0 MiB/s, 92 objects/s Thanks for any hint, -Yenya -- | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | This is the world we live in: the way to deal with computers is to google the symptoms, and hope that you don't have to watch a video. --P. Zaitcev ___ ceph-users mailing list [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.co
Re: [ceph-users] backfill_toofull after adding new OSDs
OKay, now I changed the crush rule also on a pool with the real data, and it seems all the client i/o on that pool has stopped. The recovery continues, but things like qemu I/O, "rbd ls", and so on are just stuck doing nothing. Can I unstuck it somehow (faster than waiting for all the recovery to finish)? Thanks. # ceph -s cluster: id: ... my-uuid ... health: HEALTH_ERR 3308311/3803892 objects misplaced (86.972%) Reduced data availability: 1721 pgs inactive Degraded data redundancy: 85361/3803892 objects degraded (2.244%), 1 39 pgs degraded, 139 pgs undersized Degraded data redundancy (low space): 25 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1, mon3 osd: 80 osds: 80 up, 80 in; 1868 remapped pgs rgw: 1 daemon active data: pools: 13 pools, 5056 pgs objects: 1.27 M objects, 4.8 TiB usage: 15 TiB used, 208 TiB / 224 TiB avail pgs: 34.039% pgs not active 85361/3803892 objects degraded (2.244%) 3308311/3803892 objects misplaced (86.972%) 3188 active+clean 1582 activating+remapped 139 activating+undersized+degraded+remapped 93 active+remapped+backfill_wait 29 active+remapped+backfilling 25 active+remapped+backfill_wait+backfill_toofull io: recovery: 174 MiB/s, 43 objects/s -Yenya Jan Kasprzak wrote: : : - Original Message - : : From: "Caspar Smit" : : To: "Jan Kasprzak" : : Cc: "ceph-users" : : Sent: Thursday, 31 January, 2019 15:43:07 : : Subject: Re: [ceph-users] backfill_toofull after adding new OSDs : : : : Hi Jan, : : : : You might be hitting the same issue as Wido here: : : : : [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] : : : : Kind regards, : : Caspar : : : : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | k...@fi.muni.cz ] >: : : : : : : Hello, ceph users, : : : : I see the following HEALTH_ERR during cluster rebalance: : : : : Degraded data redundancy (low space): 8 pgs backfill_toofull : : : : Detailed description: : : I have upgraded my cluster to mimic and added 16 new bluestore OSDs : : on 4 hosts. The hosts are in a separate region in my crush map, and crush : : rules prevented data to be moved on the new OSDs. Now I want to move : : all data to the new OSDs (and possibly decomission the old filestore OSDs). : : I have created the following rule: : : : : # ceph osd crush rule create-replicated on-newhosts newhostsroot host : : : : after this, I am slowly moving the pools one-by-one to this new rule: : : : : # ceph osd pool set test-hdd-pool crush_rule on-newhosts : : : : When I do this, I get the above error. This is misleading, because : : ceph osd df does not suggest the OSDs are getting full (the most full : : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR : : disappears. Why am I getting this error? : : : : # ceph -s : : cluster: : : id: ...my UUID... : : health: HEALTH_ERR : : 1271/3803223 objects misplaced (0.033%) : : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized : : Degraded data redundancy (low space): 8 pgs backfill_toofull : : : : services: : : mon: 3 daemons, quorum mon1,mon2,mon3 : : mgr: mon2(active), standbys: mon1, mon3 : : osd: 80 osds: 80 up, 80 in; 90 remapped pgs : : rgw: 1 daemon active : : : : data: : : pools: 13 pools, 5056 pgs : : objects: 1.27 M objects, 4.8 TiB : : usage: 15 TiB used, 208 TiB / 224 TiB avail : : pgs: 40124/3803223 objects degraded (1.055%) : : 1271/3803223 objects misplaced (0.033%) : : 4963 active+clean : : 41 active+recovery_wait+undersized+degraded+remapped : : 21 active+recovery_wait+undersized+degraded : : 17 active+remapped+backfill_wait : : 5 active+remapped+backfill_wait+backfill_toofull : : 3 active+remapped+backfill_toofull : : 2 active+recovering+undersized+remapped : : 2 active+recovering+undersized+degraded+remapped : : 1 active+clean+remapped : : 1 active+recovering+undersized+degraded : : : : io: : : client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr : : recovery: 2.0 MiB/s, 92 objects/s : : : : Thanks for any hint, : : : : -Yenya : : : : -- : : | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | : : | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | : : This is the world we live in: the way to deal with computers is to google : : the symptoms, and hope that you don't have to watch a video. --P. Zaitcev : : ___ : : ceph-users mailing list : : [ mailto:ceph-use
Re: [ceph-users] backfill_toofull after adding new OSDs
Fyodor Ustinov wrote: : Hi! : : I saw the same several times when I added a new osd to the cluster. One-two pg in "backfill_toofull" state. : : In all versions of mimic. Yep. In my case it is not (only) after adding the new OSDs. An hour or so ago my cluster reached the HEALTH_OK state, so I moved another pool to the new hosts with "crush_rule on-newhosts". The result was immediate backfill_toofull on two PGs for about five minutes, and then it reached the HEALTH_OK again. So the PGs are not stuck in that state forever, they are there only during the data reshuffle. 13.2.4 on CentOS 7. -Yenya : : - Original Message - : From: "Caspar Smit" : To: "Jan Kasprzak" : Cc: "ceph-users" : Sent: Thursday, 31 January, 2019 15:43:07 : Subject: Re: [ceph-users] backfill_toofull after adding new OSDs : : Hi Jan, : : You might be hitting the same issue as Wido here: : : [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] : : Kind regards, : Caspar : : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | k...@fi.muni.cz ] >: : : : Hello, ceph users, : : I see the following HEALTH_ERR during cluster rebalance: : : Degraded data redundancy (low space): 8 pgs backfill_toofull : : Detailed description: : I have upgraded my cluster to mimic and added 16 new bluestore OSDs : on 4 hosts. The hosts are in a separate region in my crush map, and crush : rules prevented data to be moved on the new OSDs. Now I want to move : all data to the new OSDs (and possibly decomission the old filestore OSDs). : I have created the following rule: : : # ceph osd crush rule create-replicated on-newhosts newhostsroot host : : after this, I am slowly moving the pools one-by-one to this new rule: : : # ceph osd pool set test-hdd-pool crush_rule on-newhosts : : When I do this, I get the above error. This is misleading, because : ceph osd df does not suggest the OSDs are getting full (the most full : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR : disappears. Why am I getting this error? : : # ceph -s : cluster: : id: ...my UUID... : health: HEALTH_ERR : 1271/3803223 objects misplaced (0.033%) : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized : Degraded data redundancy (low space): 8 pgs backfill_toofull : : services: : mon: 3 daemons, quorum mon1,mon2,mon3 : mgr: mon2(active), standbys: mon1, mon3 : osd: 80 osds: 80 up, 80 in; 90 remapped pgs : rgw: 1 daemon active : : data: : pools: 13 pools, 5056 pgs : objects: 1.27 M objects, 4.8 TiB : usage: 15 TiB used, 208 TiB / 224 TiB avail : pgs: 40124/3803223 objects degraded (1.055%) : 1271/3803223 objects misplaced (0.033%) : 4963 active+clean : 41 active+recovery_wait+undersized+degraded+remapped : 21 active+recovery_wait+undersized+degraded : 17 active+remapped+backfill_wait : 5 active+remapped+backfill_wait+backfill_toofull : 3 active+remapped+backfill_toofull : 2 active+recovering+undersized+remapped : 2 active+recovering+undersized+degraded+remapped : 1 active+clean+remapped : 1 active+recovering+undersized+degraded : : io: : client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr : recovery: 2.0 MiB/s, 92 objects/s : : Thanks for any hint, : : -Yenya : : -- : | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | : | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | : This is the world we live in: the way to deal with computers is to google : the symptoms, and hope that you don't have to watch a video. --P. Zaitcev : ___ : ceph-users mailing list : [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] : [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] : : ___ : ceph-users mailing list : ceph-users@lists.ceph.com : http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | This is the world we live in: the way to deal with computers is to google the symptoms, and hope that you don't have to watch a video. --P. Zaitcev ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] backfill_toofull after adding new OSDs
Hi! I saw the same several times when I added a new osd to the cluster. One-two pg in "backfill_toofull" state. In all versions of mimic. - Original Message - From: "Caspar Smit" To: "Jan Kasprzak" Cc: "ceph-users" Sent: Thursday, 31 January, 2019 15:43:07 Subject: Re: [ceph-users] backfill_toofull after adding new OSDs Hi Jan, You might be hitting the same issue as Wido here: [ https://www.spinics.net/lists/ceph-users/msg50603.html | https://www.spinics.net/lists/ceph-users/msg50603.html ] Kind regards, Caspar Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | k...@fi.muni.cz ] >: Hello, ceph users, I see the following HEALTH_ERR during cluster rebalance: Degraded data redundancy (low space): 8 pgs backfill_toofull Detailed description: I have upgraded my cluster to mimic and added 16 new bluestore OSDs on 4 hosts. The hosts are in a separate region in my crush map, and crush rules prevented data to be moved on the new OSDs. Now I want to move all data to the new OSDs (and possibly decomission the old filestore OSDs). I have created the following rule: # ceph osd crush rule create-replicated on-newhosts newhostsroot host after this, I am slowly moving the pools one-by-one to this new rule: # ceph osd pool set test-hdd-pool crush_rule on-newhosts When I do this, I get the above error. This is misleading, because ceph osd df does not suggest the OSDs are getting full (the most full OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR disappears. Why am I getting this error? # ceph -s cluster: id: ...my UUID... health: HEALTH_ERR 1271/3803223 objects misplaced (0.033%) Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized Degraded data redundancy (low space): 8 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1, mon3 osd: 80 osds: 80 up, 80 in; 90 remapped pgs rgw: 1 daemon active data: pools: 13 pools, 5056 pgs objects: 1.27 M objects, 4.8 TiB usage: 15 TiB used, 208 TiB / 224 TiB avail pgs: 40124/3803223 objects degraded (1.055%) 1271/3803223 objects misplaced (0.033%) 4963 active+clean 41 active+recovery_wait+undersized+degraded+remapped 21 active+recovery_wait+undersized+degraded 17 active+remapped+backfill_wait 5 active+remapped+backfill_wait+backfill_toofull 3 active+remapped+backfill_toofull 2 active+recovering+undersized+remapped 2 active+recovering+undersized+degraded+remapped 1 active+clean+remapped 1 active+recovering+undersized+degraded io: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr recovery: 2.0 MiB/s, 92 objects/s Thanks for any hint, -Yenya -- | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ http://yenya.net/ | yenya.net ] - private}> | | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 4096R/A45477D5 | This is the world we live in: the way to deal with computers is to google the symptoms, and hope that you don't have to watch a video. --P. Zaitcev ___ ceph-users mailing list [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] backfill_toofull after adding new OSDs
Hi Jan, You might be hitting the same issue as Wido here: https://www.spinics.net/lists/ceph-users/msg50603.html Kind regards, Caspar Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak : > Hello, ceph users, > > I see the following HEALTH_ERR during cluster rebalance: > > Degraded data redundancy (low space): 8 pgs backfill_toofull > > Detailed description: > I have upgraded my cluster to mimic and added 16 new bluestore OSDs > on 4 hosts. The hosts are in a separate region in my crush map, and crush > rules prevented data to be moved on the new OSDs. Now I want to move > all data to the new OSDs (and possibly decomission the old filestore OSDs). > I have created the following rule: > > # ceph osd crush rule create-replicated on-newhosts newhostsroot host > > after this, I am slowly moving the pools one-by-one to this new rule: > > # ceph osd pool set test-hdd-pool crush_rule on-newhosts > > When I do this, I get the above error. This is misleading, because > ceph osd df does not suggest the OSDs are getting full (the most full > OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR > disappears. Why am I getting this error? > > # ceph -s > cluster: > id: ...my UUID... > health: HEALTH_ERR > 1271/3803223 objects misplaced (0.033%) > Degraded data redundancy: 40124/3803223 objects degraded > (1.055%), 65 pgs degraded, 67 pgs undersized > Degraded data redundancy (low space): 8 pgs backfill_toofull > > services: > mon: 3 daemons, quorum mon1,mon2,mon3 > mgr: mon2(active), standbys: mon1, mon3 > osd: 80 osds: 80 up, 80 in; 90 remapped pgs > rgw: 1 daemon active > > data: > pools: 13 pools, 5056 pgs > objects: 1.27 M objects, 4.8 TiB > usage: 15 TiB used, 208 TiB / 224 TiB avail > pgs: 40124/3803223 objects degraded (1.055%) > 1271/3803223 objects misplaced (0.033%) > 4963 active+clean > 41 active+recovery_wait+undersized+degraded+remapped > 21 active+recovery_wait+undersized+degraded > 17 active+remapped+backfill_wait > 5active+remapped+backfill_wait+backfill_toofull > 3active+remapped+backfill_toofull > 2active+recovering+undersized+remapped > 2active+recovering+undersized+degraded+remapped > 1active+clean+remapped > 1active+recovering+undersized+degraded > > io: > client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr > recovery: 2.0 MiB/s, 92 objects/s > > Thanks for any hint, > > -Yenya > > -- > | Jan "Yenya" Kasprzak > | > | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 > | > This is the world we live in: the way to deal with computers is to google > the symptoms, and hope that you don't have to watch a video. --P. Zaitcev > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] backfill_toofull after adding new OSDs
Hello, ceph users, I see the following HEALTH_ERR during cluster rebalance: Degraded data redundancy (low space): 8 pgs backfill_toofull Detailed description: I have upgraded my cluster to mimic and added 16 new bluestore OSDs on 4 hosts. The hosts are in a separate region in my crush map, and crush rules prevented data to be moved on the new OSDs. Now I want to move all data to the new OSDs (and possibly decomission the old filestore OSDs). I have created the following rule: # ceph osd crush rule create-replicated on-newhosts newhostsroot host after this, I am slowly moving the pools one-by-one to this new rule: # ceph osd pool set test-hdd-pool crush_rule on-newhosts When I do this, I get the above error. This is misleading, because ceph osd df does not suggest the OSDs are getting full (the most full OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR disappears. Why am I getting this error? # ceph -s cluster: id: ...my UUID... health: HEALTH_ERR 1271/3803223 objects misplaced (0.033%) Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs degraded, 67 pgs undersized Degraded data redundancy (low space): 8 pgs backfill_toofull services: mon: 3 daemons, quorum mon1,mon2,mon3 mgr: mon2(active), standbys: mon1, mon3 osd: 80 osds: 80 up, 80 in; 90 remapped pgs rgw: 1 daemon active data: pools: 13 pools, 5056 pgs objects: 1.27 M objects, 4.8 TiB usage: 15 TiB used, 208 TiB / 224 TiB avail pgs: 40124/3803223 objects degraded (1.055%) 1271/3803223 objects misplaced (0.033%) 4963 active+clean 41 active+recovery_wait+undersized+degraded+remapped 21 active+recovery_wait+undersized+degraded 17 active+remapped+backfill_wait 5active+remapped+backfill_wait+backfill_toofull 3active+remapped+backfill_toofull 2active+recovering+undersized+remapped 2active+recovering+undersized+degraded+remapped 1active+clean+remapped 1active+recovering+undersized+degraded io: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr recovery: 2.0 MiB/s, 92 objects/s Thanks for any hint, -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | This is the world we live in: the way to deal with computers is to google the symptoms, and hope that you don't have to watch a video. --P. Zaitcev ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com