Re: [ceph-users] Problems with CephFS
Thanks Guys, I was out of Office, I will try your suggestions and get back to you. And extending the Cluster is something that I will do in the near Future, I just thought it would be better to get the Cluster Health back to “Normal” first. Thanks, Herbert Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Vadim Bulst Gesendet: Dienstag, 12. Juni 2018 22:34 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] Problems with CephFS Well Herbert, as Paul mentioned. You should reconfigure the threshold of your osds first and reweight second. Paul has sent you some hints. Jewel Documentation: http://docs.ceph.com/docs/jewel/rados/ osd backfill full ratio Description: Refuse to accept backfill requests when the Ceph OSD Daemon’s full ratio is above this value. Type: Float Default: 0.85 You could put this into your config with an value of 0.9 on all osd-servers and restart the osd-daemons. Don't forget "ceph osd set noout". After restarting the daemons "ceph osd unset noout" resync should take place instandly. Now set reweight on osd 1,0,2 to a value like 0.9. "ceph osd reweight 1 0.9" and so on. Herbert, you really should extend your cluster! And Or evacuate your data and rebuild it from scratch. Cheers, Vadim On 12.06.2018 16:42, Steininger, Herbert wrote: Hi, Thanks Guys for your Answers. 'ceph osd df' gives me: [root@pcl241 ceph]# ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 And [root@pcl241 ceph]# ceph osd df tree ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME -1 109.13992- 0 0 0 00 0 root default -2 0- 0 0 0 00 0 host A1214-2950-01 -3 0- 0 0 0 00 0 host A1214-2950-02 -4 0- 0 0 0 00 0 host A1214-2950-04 -5 0- 0 0 0 00 0 host A1214-2950-05 -6 0- 0 0 0 00 0 host A1214-2950-03 -7 18.18999- 18625G 15705G 2919G 84.32 1.04 0 host cuda002 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 osd.1 -8 18.18999- 18625G 15945G 2680G 85.61 1.06 0 host cuda001 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 osd.0 -9 18.18999- 18625G 14755G 3870G 79.22 0.98 0 host cuda005 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 osd.3 -10 18.18999- 18625G 14503G 4122G 77.87 0.96 0 host cuda003 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 osd.4 -11 18.18999- 18625G 15965G 2660G 85.72 1.06 0 host cuda004 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 osd.2 -12 18.18999- 21940G 16054G 5886G 73.17 0.91 0 host A1214-2950-06 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 osd.5 -13 0- 0 0 0 00 0 host pe9 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 [root@pcl241 ceph]# Is it wise to reduce the weight? Thanks, Best, Herbert -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Vadim Bulst Gesendet: Dienstag, 12. Juni 2018 11:16 An: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Betreff: Re: [ceph-users] Problems with CephFS Hi Herbert, could you please run "ceph osd df"? Cheers, Vadim On 12.06.2018 11:06, Steininger, Herbert wrote: Hi Guys, i've inherited a CephFS-Cluster, I'm fairly new to CephFS. The Cluster was down and I managed somehow to bring it up again. But now there are some Problems that I can't fix that easily. This is what 'ceph -s' is giving me as Info: [root@pcl241 ceph]# ceph -s cluster cde1487e-f930-417a-9403-28e9ebf406b8 health HEALTH_WARN 2 pgs backfill_toofull 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 260/29731463 objects degraded (0.001%) recovery 798/29731463 objects misplaced (0.003%) 2 near full osd(s) crush map has legacy tunables (require bobtail, min is firefly) crush map has straw_calc_version=0 mo
Re: [ceph-users] Problems with CephFS
Well Herbert, as Paul mentioned. You should reconfigure the threshold of your osds first and reweight second. Paul has sent you some hints. Jewel Documentation: http://docs.ceph.com/docs/jewel/rados/ |osd backfill full ratio| Description: Refuse to accept backfill requests when the Ceph OSD Daemon’s full ratio is above this value. Type: Float Default:|0.85| You could put this into your config with an value of 0.9 on all osd-servers and restart the osd-daemons. Don't forget "ceph osd set noout". After restarting the daemons "ceph osd unset noout" resync should take place instandly. Now set reweight on osd 1,0,2 to a value like 0.9. "ceph osd reweight 1 0.9" and so on. Herbert, you really should extend your cluster! And Or evacuate your data and rebuild it from scratch. Cheers, Vadim On 12.06.2018 16:42, Steininger, Herbert wrote: Hi, Thanks Guys for your Answers. 'ceph osd df' gives me: [root@pcl241 ceph]# ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 And [root@pcl241 ceph]# ceph osd df tree ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME -1 109.13992- 0 0 0 00 0 root default -2 0- 0 0 0 00 0 host A1214-2950-01 -3 0- 0 0 0 00 0 host A1214-2950-02 -4 0- 0 0 0 00 0 host A1214-2950-04 -5 0- 0 0 0 00 0 host A1214-2950-05 -6 0- 0 0 0 00 0 host A1214-2950-03 -7 18.18999- 18625G 15705G 2919G 84.32 1.04 0 host cuda002 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 osd.1 -8 18.18999- 18625G 15945G 2680G 85.61 1.06 0 host cuda001 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 osd.0 -9 18.18999- 18625G 14755G 3870G 79.22 0.98 0 host cuda005 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 osd.3 -10 18.18999- 18625G 14503G 4122G 77.87 0.96 0 host cuda003 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 osd.4 -11 18.18999- 18625G 15965G 2660G 85.72 1.06 0 host cuda004 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 osd.2 -12 18.18999- 21940G 16054G 5886G 73.17 0.91 0 host A1214-2950-06 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 osd.5 -13 0- 0 0 0 00 0 host pe9 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 [root@pcl241 ceph]# Is it wise to reduce the weight? Thanks, Best, Herbert -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Vadim Bulst Gesendet: Dienstag, 12. Juni 2018 11:16 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] Problems with CephFS Hi Herbert, could you please run "ceph osd df"? Cheers, Vadim On 12.06.2018 11:06, Steininger, Herbert wrote: Hi Guys, i've inherited a CephFS-Cluster, I'm fairly new to CephFS. The Cluster was down and I managed somehow to bring it up again. But now there are some Problems that I can't fix that easily. This is what 'ceph -s' is giving me as Info: [root@pcl241 ceph]# ceph -s cluster cde1487e-f930-417a-9403-28e9ebf406b8 health HEALTH_WARN 2 pgs backfill_toofull 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 260/29731463 objects degraded (0.001%) recovery 798/29731463 objects misplaced (0.003%) 2 near full osd(s) crush map has legacy tunables (require bobtail, min is firefly) crush map has straw_calc_version=0 monmap e8: 3 mons at {cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0} election epoch 48, quorum 0,1,2 cephcontrol,slurmmaster,slurmbackup fsmap e2288: 1/1/1 up {0=pcl241=up:active} osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs flags nearfull pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects 92929 GB used, 22139 GB / 112 TB avail 260/29731463 objects degraded (0.001%)
Re: [ceph-users] Problems with CephFS
Hi, Thanks Guys for your Answers. 'ceph osd df' gives me: [root@pcl241 ceph]# ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 And [root@pcl241 ceph]# ceph osd df tree ID WEIGHTREWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME -1 109.13992- 0 0 0 00 0 root default -2 0- 0 0 0 00 0 host A1214-2950-01 -3 0- 0 0 0 00 0 host A1214-2950-02 -4 0- 0 0 0 00 0 host A1214-2950-04 -5 0- 0 0 0 00 0 host A1214-2950-05 -6 0- 0 0 0 00 0 host A1214-2950-03 -7 18.18999- 18625G 15705G 2919G 84.32 1.04 0 host cuda002 1 18.18999 1.0 18625G 15705G 2919G 84.32 1.04 152 osd.1 -8 18.18999- 18625G 15945G 2680G 85.61 1.06 0 host cuda001 0 18.18999 1.0 18625G 15945G 2680G 85.61 1.06 165 osd.0 -9 18.18999- 18625G 14755G 3870G 79.22 0.98 0 host cuda005 3 18.18999 1.0 18625G 14755G 3870G 79.22 0.98 162 osd.3 -10 18.18999- 18625G 14503G 4122G 77.87 0.96 0 host cuda003 4 18.18999 1.0 18625G 14503G 4122G 77.87 0.96 158 osd.4 -11 18.18999- 18625G 15965G 2660G 85.72 1.06 0 host cuda004 2 18.18999 1.0 18625G 15965G 2660G 85.72 1.06 165 osd.2 -12 18.18999- 21940G 16054G 5886G 73.17 0.91 0 host A1214-2950-06 5 18.18999 1.0 21940G 16054G 5886G 73.17 0.91 159 osd.5 -13 0- 0 0 0 00 0 host pe9 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 [root@pcl241 ceph]# Is it wise to reduce the weight? Thanks, Best, Herbert -Ursprüngliche Nachricht- Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Vadim Bulst Gesendet: Dienstag, 12. Juni 2018 11:16 An: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] Problems with CephFS Hi Herbert, could you please run "ceph osd df"? Cheers, Vadim On 12.06.2018 11:06, Steininger, Herbert wrote: > Hi Guys, > > i've inherited a CephFS-Cluster, I'm fairly new to CephFS. > The Cluster was down and I managed somehow to bring it up again. > But now there are some Problems that I can't fix that easily. > This is what 'ceph -s' is giving me as Info: > [root@pcl241 ceph]# ceph -s > cluster cde1487e-f930-417a-9403-28e9ebf406b8 > health HEALTH_WARN > 2 pgs backfill_toofull > 1 pgs degraded > 1 pgs stuck degraded > 2 pgs stuck unclean > 1 pgs stuck undersized > 1 pgs undersized > recovery 260/29731463 objects degraded (0.001%) > recovery 798/29731463 objects misplaced (0.003%) > 2 near full osd(s) > crush map has legacy tunables (require bobtail, min is firefly) > crush map has straw_calc_version=0 > monmap e8: 3 mons at > {cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0} > election epoch 48, quorum 0,1,2 > cephcontrol,slurmmaster,slurmbackup >fsmap e2288: 1/1/1 up {0=pcl241=up:active} > osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs > flags nearfull >pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects > 92929 GB used, 22139 GB / 112 TB avail > 260/29731463 objects degraded (0.001%) > 798/29731463 objects misplaced (0.003%) > 316 active+clean > 2 active+clean+scrubbing+deep > 1 active+undersized+degraded+remapped+backfill_toofull > 1 active+remapped+backfill_toofull > [root@pcl241 ceph]# > > > [root@pcl241 ceph]# ceph osd tree > ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 109.13992 root default > -2 0 host A1214-2950-01 > -3 0 host A1214-2950-02 > -4 0 host A1214-2950-04 > -5 0 host A1214-2950-05 > -6 0 host A1214-2950-03 > -7 18.18999 host cuda002 >1 18.18999 osd.1 up 1.0 1.0 > -8 18.1899
Re: [ceph-users] Problems with CephFS
Figure out which OSDs are too full: ceph osd df tree Then you can either reduce their weight: ceph osd reweight 0.9 Or increase the threshhold after which an OSD is considered too full for backfills. How this is configured depends on the version, i think in your version it is still ceph pg set_backfillfull_ratio 0.XX It's probably currently configured to 0.85 Paul 2018-06-12 11:06 GMT+02:00 Steininger, Herbert < herbert_steinin...@psych.mpg.de>: > Hi Guys, > > i've inherited a CephFS-Cluster, I'm fairly new to CephFS. > The Cluster was down and I managed somehow to bring it up again. > But now there are some Problems that I can't fix that easily. > This is what 'ceph -s' is giving me as Info: > [root@pcl241 ceph]# ceph -s > cluster cde1487e-f930-417a-9403-28e9ebf406b8 > health HEALTH_WARN > 2 pgs backfill_toofull > 1 pgs degraded > 1 pgs stuck degraded > 2 pgs stuck unclean > 1 pgs stuck undersized > 1 pgs undersized > recovery 260/29731463 objects degraded (0.001%) > recovery 798/29731463 objects misplaced (0.003%) > 2 near full osd(s) > crush map has legacy tunables (require bobtail, min is firefly) > crush map has straw_calc_version=0 > monmap e8: 3 mons at {cephcontrol=172.22.12.241: > 6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0} > election epoch 48, quorum 0,1,2 cephcontrol,slurmmaster, > slurmbackup > fsmap e2288: 1/1/1 up {0=pcl241=up:active} > osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs > flags nearfull > pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects > 92929 GB used, 22139 GB / 112 TB avail > 260/29731463 objects degraded (0.001%) > 798/29731463 objects misplaced (0.003%) > 316 active+clean >2 active+clean+scrubbing+deep >1 active+undersized+degraded+remapped+backfill_toofull >1 active+remapped+backfill_toofull > [root@pcl241 ceph]# > > > [root@pcl241 ceph]# ceph osd tree > ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 109.13992 root default > -2 0 host A1214-2950-01 > -3 0 host A1214-2950-02 > -4 0 host A1214-2950-04 > -5 0 host A1214-2950-05 > -6 0 host A1214-2950-03 > -7 18.18999 host cuda002 > 1 18.18999 osd.1 up 1.0 1.0 > -8 18.18999 host cuda001 > 0 18.18999 osd.0 up 1.0 1.0 > -9 18.18999 host cuda005 > 3 18.18999 osd.3 up 1.0 1.0 > -10 18.18999 host cuda003 > 4 18.18999 osd.4 up 1.0 1.0 > -11 18.18999 host cuda004 > 2 18.18999 osd.2 up 1.0 1.0 > -12 18.18999 host A1214-2950-06 > 5 18.18999 osd.5 up 1.0 1.0 > -13 0 host pe9 > > > > > Could someone please put me in the right Direction about what to do to fix > the Problems? > It seems that two OSD are full, but how can I solve that, if I don't have > additionally hardware available? > Also it seems that the Cluster has different ceph-versions running (Hammer > and Jewel), how to solve that? > Ceph-(mds/-mon/-osd) is running on Scientific Linux. > If more Info is needed, just let me know. > > Thanks in Advance, > Steininger Herbert > > --- > Herbert Steininger > Leiter EDV > Administrator > Max-Planck-Institut für Psychiatrie - EDV > Kraepelinstr. 2-10 > 80804 München > Tel +49 (0)89 / 30622-368 > Mail herbert_steinin...@psych.mpg.de > Web http://www.psych.mpg.de > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problems with CephFS
Hi Herbert, could you please run "ceph osd df"? Cheers, Vadim On 12.06.2018 11:06, Steininger, Herbert wrote: Hi Guys, i've inherited a CephFS-Cluster, I'm fairly new to CephFS. The Cluster was down and I managed somehow to bring it up again. But now there are some Problems that I can't fix that easily. This is what 'ceph -s' is giving me as Info: [root@pcl241 ceph]# ceph -s cluster cde1487e-f930-417a-9403-28e9ebf406b8 health HEALTH_WARN 2 pgs backfill_toofull 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 260/29731463 objects degraded (0.001%) recovery 798/29731463 objects misplaced (0.003%) 2 near full osd(s) crush map has legacy tunables (require bobtail, min is firefly) crush map has straw_calc_version=0 monmap e8: 3 mons at {cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0} election epoch 48, quorum 0,1,2 cephcontrol,slurmmaster,slurmbackup fsmap e2288: 1/1/1 up {0=pcl241=up:active} osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs flags nearfull pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects 92929 GB used, 22139 GB / 112 TB avail 260/29731463 objects degraded (0.001%) 798/29731463 objects misplaced (0.003%) 316 active+clean 2 active+clean+scrubbing+deep 1 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped+backfill_toofull [root@pcl241 ceph]# [root@pcl241 ceph]# ceph osd tree ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 109.13992 root default -2 0 host A1214-2950-01 -3 0 host A1214-2950-02 -4 0 host A1214-2950-04 -5 0 host A1214-2950-05 -6 0 host A1214-2950-03 -7 18.18999 host cuda002 1 18.18999 osd.1 up 1.0 1.0 -8 18.18999 host cuda001 0 18.18999 osd.0 up 1.0 1.0 -9 18.18999 host cuda005 3 18.18999 osd.3 up 1.0 1.0 -10 18.18999 host cuda003 4 18.18999 osd.4 up 1.0 1.0 -11 18.18999 host cuda004 2 18.18999 osd.2 up 1.0 1.0 -12 18.18999 host A1214-2950-06 5 18.18999 osd.5 up 1.0 1.0 -13 0 host pe9 Could someone please put me in the right Direction about what to do to fix the Problems? It seems that two OSD are full, but how can I solve that, if I don't have additionally hardware available? Also it seems that the Cluster has different ceph-versions running (Hammer and Jewel), how to solve that? Ceph-(mds/-mon/-osd) is running on Scientific Linux. If more Info is needed, just let me know. Thanks in Advance, Steininger Herbert --- Herbert Steininger Leiter EDV Administrator Max-Planck-Institut für Psychiatrie - EDV Kraepelinstr. 2-10 80804 München Tel +49 (0)89 / 30622-368 Mail herbert_steinin...@psych.mpg.de Web http://www.psych.mpg.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Vadim Bulst Universität Leipzig / URZ 04109 Leipzig, Augustusplatz 10 phone: ++49-341-97-33380 mail:vadim.bu...@uni-leipzig.de smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com