Re: [ceph-users] Reduced data availability: 2 pgs inactive
Hi Paul, thanks for the hint. Restarting the primary osds of the inactive pgs resolved the problem: Before restarting them they said: 2019-06-19 15:55:36.190 7fcd55c4e700 -1 osd.5 33858 get_health_metrics reporting 15 slow ops, oldest is osd_op(client.220116.0:967410 21.2e4s0 21.d4e19ae4 (undecoded) ondisk+write+known_if_redirected e31569) and 2019-06-19 15:53:31.214 7f9b946d1700 -1 osd.13 33849 get_health_metrics reporting 14560 slow ops, oldest is osd_op(mds.0.44294:99584053 23.5 23.cad28605 (undecoded) ondisk+write+known_if_redirected+full_force e31562) Is this something to worry about? Regards, Lars Wed, 19 Jun 2019 15:04:06 +0200 Paul Emmerich ==> Lars Täuber : > That shouldn't trigger the PG limit (yet), but increasing "mon max pg per > osd" from the default of 200 is a good idea anyways since you are running > with more than 200 PGs per OSD. > > I'd try to restart all OSDs that are in the UP set for that PG: > > 13, > 21, > 23 > 7, > 29, > 9, > 28, > 11, > 8 > > > Maybe that solves it (technically it shouldn't), if that doesn't work > you'll have to dig in deeper into the log files to see where exactly and > why it is stuck activating. > > Paul > -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reduced data availability: 2 pgs inactive
That shouldn't trigger the PG limit (yet), but increasing "mon max pg per osd" from the default of 200 is a good idea anyways since you are running with more than 200 PGs per OSD. I'd try to restart all OSDs that are in the UP set for that PG: 13, 21, 23 7, 29, 9, 28, 11, 8 Maybe that solves it (technically it shouldn't), if that doesn't work you'll have to dig in deeper into the log files to see where exactly and why it is stuck activating. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 19, 2019 at 2:30 PM Lars Täuber wrote: > Hi Paul, > > thanks for your reply. > > Wed, 19 Jun 2019 13:19:55 +0200 > Paul Emmerich ==> Lars Täuber : > > Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? > > If this is the case: increase "osd max pg per osd hard ratio" > > > > Check "ceph pg query" to see why it isn't activating. > > > > Can you share the output of "ceph osd df tree" and "ceph pg query" > > of the affected PGs? > > The pg queries are attached. I can't read them - to much information. > > > Here is the osd df tree: > # osd df tree > ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA > AVAIL %USE VAR PGS STATUS TYPE NAME > -1 167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB 57 GiB 162 > TiB 2.79 1.00 -root PRZ > -1772.43192- 72 TiB 2.0 TiB 535 GiB 393 MiB 25 GiB 70 > TiB 2.78 1.00 -rack 1-eins > -922.28674- 22 TiB 640 GiB 170 GiB 82 MiB 9.0 GiB 22 > TiB 2.80 1.01 -host onode1 > 2 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 11 MiB 2.3 GiB 5.4 > TiB 2.84 1.02 224 up osd.2 > 9 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 19 MiB 2.1 GiB 5.4 > TiB 2.74 0.98 201 up osd.9 > 14 hdd 5.57169 1.0 5.6 TiB 162 GiB 44 GiB 24 MiB 2.1 GiB 5.4 > TiB 2.84 1.02 230 up osd.14 > 21 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 27 MiB 2.5 GiB 5.4 > TiB 2.80 1.00 219 up osd.21 > -1322.28674- 22 TiB 640 GiB 170 GiB 123 MiB 8.9 GiB 22 > TiB 2.80 1.00 -host onode4 > 4 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 38 MiB 2.2 GiB 5.4 > TiB 2.73 0.98 205 up osd.4 > 11 hdd 5.57169 1.0 5.6 TiB 164 GiB 47 GiB 24 MiB 2.0 GiB 5.4 > TiB 2.87 1.03 241 up osd.11 > 18 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 2.5 GiB 5.4 > TiB 2.79 1.00 221 up osd.18 > 22 hdd 5.57169 1.0 5.6 TiB 160 GiB 43 GiB 29 MiB 2.1 GiB 5.4 > TiB 2.81 1.01 225 up osd.22 > -527.85843- 28 TiB 782 GiB 195 GiB 188 MiB 6.9 GiB 27 > TiB 2.74 0.98 -host onode7 > 5 hdd 5.57169 1.0 5.6 TiB 158 GiB 41 GiB 26 MiB 1.2 GiB 5.4 > TiB 2.77 0.99 213 up osd.5 > 12 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 993 MiB 5.4 > TiB 2.79 1.00 222 up osd.12 > 20 hdd 5.57169 1.0 5.6 TiB 157 GiB 40 GiB 47 MiB 1.2 GiB 5.4 > TiB 2.76 0.99 212 up osd.20 > 27 hdd 5.57169 1.0 5.6 TiB 151 GiB 33 GiB 28 MiB 1.9 GiB 5.4 > TiB 2.64 0.95 179 up osd.27 > 29 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 56 MiB 1.7 GiB 5.4 > TiB 2.74 0.98 203 up osd.29 > -1844.57349- 45 TiB 1.3 TiB 341 GiB 248 MiB 14 GiB 43 > TiB 2.81 1.01 -rack 2-zwei > -722.28674- 22 TiB 641 GiB 171 GiB 132 MiB 6.7 GiB 22 > TiB 2.81 1.01 -host onode2 > 1 hdd 5.57169 1.0 5.6 TiB 155 GiB 38 GiB 35 MiB 1.2 GiB 5.4 > TiB 2.72 0.97 203 up osd.1 > 8 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 36 MiB 2.4 GiB 5.4 > TiB 2.86 1.02 243 up osd.8 > 16 hdd 5.57169 1.0 5.6 TiB 161 GiB 43 GiB 24 MiB 1000 MiB 5.4 > TiB 2.82 1.01 221 up osd.16 > 23 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 37 MiB 2.1 GiB 5.4 > TiB 2.84 1.02 228 up osd.23 > -322.28674- 22 TiB 640 GiB 170 GiB 116 MiB 7.6 GiB 22 > TiB 2.80 1.00 -host onode5 > 3 hdd 5.57169 1.0 5.6 TiB 154 GiB 36 GiB 14 MiB 1010 MiB 5.4 > TiB 2.70 0.97 186 up osd.3 > 7 hdd 5.57169 1.0 5.6 TiB 161 GiB 44 GiB 22 MiB 2.2 GiB 5.4 > TiB 2.82 1.01 221 up osd.7 > 15 hdd 5.57169 1.0 5.6 TiB 165 GiB 48 GiB 26 MiB 2.3 GiB 5.4 > TiB 2.89 1.04 249 up osd.15 > 24 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 54 MiB 2.1 GiB 5.4 > TiB 2.80 1.00 223 up osd.24 > -1950.14517- 50 TiB 1.4 TiB 376 GiB 311 MiB 18 GiB 49
Re: [ceph-users] Reduced data availability: 2 pgs inactive
Hi Paul, thanks for your reply. Wed, 19 Jun 2019 13:19:55 +0200 Paul Emmerich ==> Lars Täuber : > Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? > If this is the case: increase "osd max pg per osd hard ratio" > > Check "ceph pg query" to see why it isn't activating. > > Can you share the output of "ceph osd df tree" and "ceph pg query" > of the affected PGs? The pg queries are attached. I can't read them - to much information. Here is the osd df tree: # osd df tree ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA AVAIL %USE VAR PGS STATUS TYPE NAME -1 167.15057- 167 TiB 4.7 TiB 1.2 TiB 952 MiB 57 GiB 162 TiB 2.79 1.00 -root PRZ -1772.43192- 72 TiB 2.0 TiB 535 GiB 393 MiB 25 GiB 70 TiB 2.78 1.00 -rack 1-eins -922.28674- 22 TiB 640 GiB 170 GiB 82 MiB 9.0 GiB 22 TiB 2.80 1.01 -host onode1 2 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 11 MiB 2.3 GiB 5.4 TiB 2.84 1.02 224 up osd.2 9 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 19 MiB 2.1 GiB 5.4 TiB 2.74 0.98 201 up osd.9 14 hdd 5.57169 1.0 5.6 TiB 162 GiB 44 GiB 24 MiB 2.1 GiB 5.4 TiB 2.84 1.02 230 up osd.14 21 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 27 MiB 2.5 GiB 5.4 TiB 2.80 1.00 219 up osd.21 -1322.28674- 22 TiB 640 GiB 170 GiB 123 MiB 8.9 GiB 22 TiB 2.80 1.00 -host onode4 4 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 38 MiB 2.2 GiB 5.4 TiB 2.73 0.98 205 up osd.4 11 hdd 5.57169 1.0 5.6 TiB 164 GiB 47 GiB 24 MiB 2.0 GiB 5.4 TiB 2.87 1.03 241 up osd.11 18 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 2.5 GiB 5.4 TiB 2.79 1.00 221 up osd.18 22 hdd 5.57169 1.0 5.6 TiB 160 GiB 43 GiB 29 MiB 2.1 GiB 5.4 TiB 2.81 1.01 225 up osd.22 -527.85843- 28 TiB 782 GiB 195 GiB 188 MiB 6.9 GiB 27 TiB 2.74 0.98 -host onode7 5 hdd 5.57169 1.0 5.6 TiB 158 GiB 41 GiB 26 MiB 1.2 GiB 5.4 TiB 2.77 0.99 213 up osd.5 12 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 31 MiB 993 MiB 5.4 TiB 2.79 1.00 222 up osd.12 20 hdd 5.57169 1.0 5.6 TiB 157 GiB 40 GiB 47 MiB 1.2 GiB 5.4 TiB 2.76 0.99 212 up osd.20 27 hdd 5.57169 1.0 5.6 TiB 151 GiB 33 GiB 28 MiB 1.9 GiB 5.4 TiB 2.64 0.95 179 up osd.27 29 hdd 5.57169 1.0 5.6 TiB 156 GiB 39 GiB 56 MiB 1.7 GiB 5.4 TiB 2.74 0.98 203 up osd.29 -1844.57349- 45 TiB 1.3 TiB 341 GiB 248 MiB 14 GiB 43 TiB 2.81 1.01 -rack 2-zwei -722.28674- 22 TiB 641 GiB 171 GiB 132 MiB 6.7 GiB 22 TiB 2.81 1.01 -host onode2 1 hdd 5.57169 1.0 5.6 TiB 155 GiB 38 GiB 35 MiB 1.2 GiB 5.4 TiB 2.72 0.97 203 up osd.1 8 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 36 MiB 2.4 GiB 5.4 TiB 2.86 1.02 243 up osd.8 16 hdd 5.57169 1.0 5.6 TiB 161 GiB 43 GiB 24 MiB 1000 MiB 5.4 TiB 2.82 1.01 221 up osd.16 23 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 37 MiB 2.1 GiB 5.4 TiB 2.84 1.02 228 up osd.23 -322.28674- 22 TiB 640 GiB 170 GiB 116 MiB 7.6 GiB 22 TiB 2.80 1.00 -host onode5 3 hdd 5.57169 1.0 5.6 TiB 154 GiB 36 GiB 14 MiB 1010 MiB 5.4 TiB 2.70 0.97 186 up osd.3 7 hdd 5.57169 1.0 5.6 TiB 161 GiB 44 GiB 22 MiB 2.2 GiB 5.4 TiB 2.82 1.01 221 up osd.7 15 hdd 5.57169 1.0 5.6 TiB 165 GiB 48 GiB 26 MiB 2.3 GiB 5.4 TiB 2.89 1.04 249 up osd.15 24 hdd 5.57169 1.0 5.6 TiB 160 GiB 42 GiB 54 MiB 2.1 GiB 5.4 TiB 2.80 1.00 223 up osd.24 -1950.14517- 50 TiB 1.4 TiB 376 GiB 311 MiB 18 GiB 49 TiB 2.79 1.00 -rack 3-drei -1522.28674- 22 TiB 649 GiB 179 GiB 112 MiB 8.2 GiB 22 TiB 2.84 1.02 -host onode3 0 hdd 5.57169 1.0 5.6 TiB 162 GiB 45 GiB 28 MiB 996 MiB 5.4 TiB 2.84 1.02 229 up osd.0 10 hdd 5.57169 1.0 5.6 TiB 159 GiB 42 GiB 21 MiB 2.2 GiB 5.4 TiB 2.79 1.00 213 up osd.10 17 hdd 5.57169 1.0 5.6 TiB 165 GiB 47 GiB 19 MiB 2.5 GiB 5.4 TiB 2.88 1.03 238 up osd.17 25 hdd 5.57169 1.0 5.6 TiB 163 GiB 46 GiB 44 MiB 2.5 GiB 5.4 TiB 2.86 1.03 242 up osd.25 -1127.85843- 28 TiB 784 GiB 197 GiB 199 MiB 9.4 GiB 27 TiB 2.75 0.99 -host onode6 6 hdd
Re: [ceph-users] Reduced data availability: 2 pgs inactive
Wild guess: you hit the PG hard limit, how many PGs per OSD do you have? If this is the case: increase "osd max pg per osd hard ratio" Check "ceph pg query" to see why it isn't activating. Can you share the output of "ceph osd df tree" and "ceph pg query" of the affected PGs? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 19, 2019 at 8:52 AM Lars Täuber wrote: > Hi there! > > Recently I made our cluster rack aware > by adding racks to the crush map. > The failure domain was and still is "host". > > rule cephfs2_data { > id 7 > type erasure > min_size 3 > max_size 6 > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take PRZ > step chooseleaf indep 0 type host > step emit > > > Then I sorted the hosts into the new > rack buckets of the crush map as they > are in reality, by: > # osd crush move onodeX rack=XYZ > for all hosts. > > The cluster started to reorder the data. > > In the end the cluster has now: > HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs > inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), > 2 pgs degraded, 2 pgs undersized > FS_DEGRADED 1 filesystem is degraded > fs cephfs_1 is degraded > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.2e4 is stuck inactive for 142792.952697, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck inactive for 142791.437243, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded > (0.029%), 2 pgs degraded, 2 pgs undersized > pg 21.2e4 is stuck undersized for 142779.321192, current state > activating+undersized+degraded+remapped+forced_backfill, last acting > [5,2147483647,25,28,11,2] > pg 23.5 is stuck undersized for 142789.747915, current state > activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] > > The cluster hosts a cephfs which is > not mountable anymore. > > I tried a few things (as you can see: > forced_backfill), but failed. > > The cephfs_data pool is EC 4+2. > Both inactive pgs seem to have enough > copies to recalculate the contents for > all osds. > > Is there a chance to get both pgs > clean again? > > How can I force the pgs to recalculate > all necessary copies? > > > Thanks > Lars > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Reduced data availability: 2 pgs inactive
Hi there! Recently I made our cluster rack aware by adding racks to the crush map. The failure domain was and still is "host". rule cephfs2_data { id 7 type erasure min_size 3 max_size 6 step set_chooseleaf_tries 5 step set_choose_tries 100 step take PRZ step chooseleaf indep 0 type host step emit Then I sorted the hosts into the new rack buckets of the crush map as they are in reality, by: # osd crush move onodeX rack=XYZ for all hosts. The cluster started to reorder the data. In the end the cluster has now: HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized FS_DEGRADED 1 filesystem is degraded fs cephfs_1 is degraded PG_AVAILABILITY Reduced data availability: 2 pgs inactive pg 21.2e4 is stuck inactive for 142792.952697, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2] pg 23.5 is stuck inactive for 142791.437243, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2 pgs degraded, 2 pgs undersized pg 21.2e4 is stuck undersized for 142779.321192, current state activating+undersized+degraded+remapped+forced_backfill, last acting [5,2147483647,25,28,11,2] pg 23.5 is stuck undersized for 142789.747915, current state activating+undersized+degraded+remapped+forced_backfill, last acting [13,21] The cluster hosts a cephfs which is not mountable anymore. I tried a few things (as you can see: forced_backfill), but failed. The cephfs_data pool is EC 4+2. Both inactive pgs seem to have enough copies to recalculate the contents for all osds. Is there a chance to get both pgs clean again? How can I force the pgs to recalculate all necessary copies? Thanks Lars ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com