Re: [ceph-users] Very slow start of osds after reboot
Hi, I have the same issue with Ceph Jewel (10.2.9), RedHat7 and dmcrypt Is there any fix or at least a workaround available ? Regards, Manuel Am Thu, 31 Aug 2017 16:24:10 +0200 schrieb Piotr Dzionek: > Hi, > > For a last 3 weeks I have been running latest LTS Luminous Ceph > release on CentOS7. It started with 4th RC and now I have Stable > Release. Cluster runs fine, however I noticed that if I do a reboot > of one the nodes, it takes a really long time for cluster to be in ok > status. Osds are starting up, but not as soon as the server is up. > They are up one by one during a period of 5 minutes. I checked the > logs and all osds have following errors. > > > > As you can see the xfs volume(the part with meta-data) is not mounted > yet. My question here, what mounts it and why it takes so long ? > Maybe there is a setting that randomizes the start up process of osds > running on the same node? > > Kind regards, > Piotr Dzionek > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Manuel Lausch Systemadministrator Cloud Services 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135 Karlsruhe | Germany Phone: +49 721 91374-1847 E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de Amtsgericht Montabaur, HRB 5452 Geschäftsführer: Thomas Ludwig, Jan Oetjen Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient of this e-mail, you are hereby notified that saving, distribution or use of the content of this e-mail in any way is prohibited. If you have received this e-mail in error, please notify the sender and delete the e-mail. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Very slow start of osds after reboot
Hi, I have RAID0 for each disks, unfortunately my raid doesn't support JBOD. Apart from this I also run separate cluster with Jewel 10.2.9 on RAID0 and there is no such problem(I just tested it). Moreover, a cluster that has this issue used to run Firefly with RAID0 and everything was fine. W dniu 31.08.2017 o 18:14, Hervé Ballans pisze: Hi Piotr, Just to verify one point, how are connected your disks (physically), in a NON-RAID or RAID0 mode ? rv Le 31/08/2017 à 16:24, Piotr Dzionek a écrit : For a last 3 weeks I have been running latest LTS Luminous Ceph release on CentOS7. It started with 4th RC and now I have Stable Release. Cluster runs fine, however I noticed that if I do a reboot of one the nodes, it takes a really long time for cluster to be in ok status. Osds are starting up, but not as soon as the server is up. They are up one by one during a period of 5 minutes. I checked the logs and all osds have following errors. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com These are different systemd units ceph-osd.target and ceph-osd.target so they have separate dependencies. So I guess this is nothing weird. W dniu 31.08.2017 o 17:13, Dan van der Ster pisze: Random theory... I just noticed that the ceph-osd's are listed twice [1] in the output of systemctl list-dependencies. Is that correct?!!! -- dan [1] > systemctl list-dependencies ... ● ├─ceph-mds.target ● ├─ceph-mon.target ● ├─ceph-osd.target ● │ ├─ceph-osd@48.service ● │ ├─ceph-osd@49.service ● │ ├─ceph-osd@50.service ● │ ├─ceph-osd@51.service ● │ ├─ceph-osd@53.service ● │ ├─ceph-osd@54.service ● │ ├─ceph-osd@55.service ● │ ├─ceph-osd@56.service ● │ ├─ceph-osd@59.service ● │ ├─ceph-osd@61.service ● │ ├─ceph-osd@63.service ● │ ├─ceph-osd@65.service ● │ ├─ceph-osd@68.service ● │ ├─ceph-osd@70.service ● │ ├─ceph-osd@74.service ● │ ├─ceph-osd@80.service ● │ ├─ceph-osd@81.service ● │ ├─ceph-osd@82.service ● │ ├─ceph-osd@83.service ● │ ├─ceph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─ceph.target ● │ ├─ceph-mds.target ● │ ├─ceph-mon.target ● │ └─ceph-osd.target ● │ ├─ceph-osd@48.service ● │ ├─ceph-osd@49.service ● │ ├─ceph-osd@50.service ● │ ├─ceph-osd@51.service ● │ ├─ceph-osd@53.service ● │ ├─ceph-osd@54.service ● │ ├─ceph-osd@55.service ● │ ├─ceph-osd@56.service ● │ ├─ceph-osd@59.service ● │ ├─ceph-osd@61.service ● │ ├─ceph-osd@63.service ● │ ├─ceph-osd@65.service ● │ ├─ceph-osd@68.service ● │ ├─ceph-osd@70.service ● │ ├─ceph-osd@74.service ● │ ├─ceph-osd@80.service ● │ ├─ceph-osd@81.service ● │ ├─ceph-osd@82.service ● │ ├─ceph-osd@83.service ● │ ├─ceph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─getty.target ... I tested it on jewel 10.2.9 and I don't see this issue here. On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Sterwrote: Hi, I see the same with jewel on el7 -- it started one of the recent point releases around ~10.2.5, IIRC. Problem seems to be the same -- daemon is started before the osd is mounted... then the service waits several seconds before trying again. Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1 #033[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data /var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited, code=exited, status=1/FAILURE Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state. Aug 31 15:41:47 systemd: ceph-osd@89.service failed. Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount Aug 31 15:41:47 rc.local: Removed symlink /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:47 rc.local: Created symlink from /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to /usr/lib/systemd/system/ceph-osd@.service. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1 osd.55 123659 log_to_monitors {default=true} Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over,
Re: [ceph-users] Very slow start of osds after reboot
Hi Piotr, Just to verify one point, how are connected your disks (physically), in a NON-RAID or RAID0 mode ? rv Le 31/08/2017 à 16:24, Piotr Dzionek a écrit : For a last 3 weeks I have been running latest LTS Luminous Ceph release on CentOS7. It started with 4th RC and now I have Stable Release. Cluster runs fine, however I noticed that if I do a reboot of one the nodes, it takes a really long time for cluster to be in ok status. Osds are starting up, but not as soon as the server is up. They are up one by one during a period of 5 minutes. I checked the logs and all osds have following errors. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Very slow start of osds after reboot
Random theory... I just noticed that the ceph-osd's are listed twice [1] in the output of systemctl list-dependencies. Is that correct?!!! -- dan [1] > systemctl list-dependencies ... ● ├─ceph-mds.target ● ├─ceph-mon.target ● ├─ceph-osd.target ● │ ├─ceph-osd@48.service ● │ ├─ceph-osd@49.service ● │ ├─ceph-osd@50.service ● │ ├─ceph-osd@51.service ● │ ├─ceph-osd@53.service ● │ ├─ceph-osd@54.service ● │ ├─ceph-osd@55.service ● │ ├─ceph-osd@56.service ● │ ├─ceph-osd@59.service ● │ ├─ceph-osd@61.service ● │ ├─ceph-osd@63.service ● │ ├─ceph-osd@65.service ● │ ├─ceph-osd@68.service ● │ ├─ceph-osd@70.service ● │ ├─ceph-osd@74.service ● │ ├─ceph-osd@80.service ● │ ├─ceph-osd@81.service ● │ ├─ceph-osd@82.service ● │ ├─ceph-osd@83.service ● │ ├─ceph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─ceph.target ● │ ├─ceph-mds.target ● │ ├─ceph-mon.target ● │ └─ceph-osd.target ● │ ├─ceph-osd@48.service ● │ ├─ceph-osd@49.service ● │ ├─ceph-osd@50.service ● │ ├─ceph-osd@51.service ● │ ├─ceph-osd@53.service ● │ ├─ceph-osd@54.service ● │ ├─ceph-osd@55.service ● │ ├─ceph-osd@56.service ● │ ├─ceph-osd@59.service ● │ ├─ceph-osd@61.service ● │ ├─ceph-osd@63.service ● │ ├─ceph-osd@65.service ● │ ├─ceph-osd@68.service ● │ ├─ceph-osd@70.service ● │ ├─ceph-osd@74.service ● │ ├─ceph-osd@80.service ● │ ├─ceph-osd@81.service ● │ ├─ceph-osd@82.service ● │ ├─ceph-osd@83.service ● │ ├─ceph-osd@84.service ● │ ├─ceph-osd@89.service ● │ ├─ceph-osd@90.service ● │ ├─ceph-osd@91.service ● │ └─ceph-osd@92.service ● ├─getty.target ... On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Sterwrote: > Hi, > > I see the same with jewel on el7 -- it started one of the recent point > releases around ~10.2.5, IIRC. > > Problem seems to be the same -- daemon is started before the osd is > mounted... then the service waits several seconds before trying again. > > Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1 > #033[0;31m ** ERROR: unable to open OSD superblock on > /var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m > Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data > /var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal > Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited, > code=exited, status=1/FAILURE > Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state. > Aug 31 15:41:47 systemd: ceph-osd@89.service failed. > Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount > Aug 31 15:41:47 rc.local: Removed symlink > /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service. > Aug 31 15:41:47 systemd: Reloading. > Aug 31 15:41:47 systemd: Reloading. > Aug 31 15:41:47 rc.local: Created symlink from > /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to > /usr/lib/systemd/system/ceph-osd@.service. > Aug 31 15:41:47 systemd: Reloading. > Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1 > osd.55 123659 log_to_monitors {default=true} > Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over, > scheduling restart. > Aug 31 15:42:07 systemd: ceph-osd@89.service holdoff time over, > scheduling restart. > > > -- Dan > > > > On Thu, Aug 31, 2017 at 4:24 PM, Piotr Dzionek wrote: >> Hi, >> >> For a last 3 weeks I have been running latest LTS Luminous Ceph release on >> CentOS7. It started with 4th RC and now I have Stable Release. >> Cluster runs fine, however I noticed that if I do a reboot of one the nodes, >> it takes a really long time for cluster to be in ok status. >> Osds are starting up, but not as soon as the server is up. They are up one >> by one during a period of 5 minutes. I checked the logs and all osds have >> following errors. >> >> 2017-08-30 15:27:52.541366 7f7dabd0d700 30 Event(0x7f7dbc9f4a80 nevent=5000 >> time_id=62).process_events event_wq process is 11 mask is 1 >> 2017-08-30 15:51:03.639222 7faf11c3ed00 0 set uid:gid to 167:167 >> (ceph:ceph) >> 2017-08-30 15:51:03.639342 7faf11c3ed00 0 ceph version 12.2.0 >> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), >> pid 3037 >> 2017-08-30 15:51:03.672898 7faf11c3ed00 -1 ESC[0;31m ** ERROR: unable to >> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or >> directoryESC[0m >> 2017-08-30 15:51:42.453334 7f9f55f11d00 0 set uid:gid to 167:167 >>
Re: [ceph-users] Very slow start of osds after reboot
Hi, I see the same with jewel on el7 -- it started one of the recent point releases around ~10.2.5, IIRC. Problem seems to be the same -- daemon is started before the osd is mounted... then the service waits several seconds before trying again. Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1 #033[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data /var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited, code=exited, status=1/FAILURE Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state. Aug 31 15:41:47 systemd: ceph-osd@89.service failed. Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount Aug 31 15:41:47 rc.local: Removed symlink /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:47 rc.local: Created symlink from /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to /usr/lib/systemd/system/ceph-osd@.service. Aug 31 15:41:47 systemd: Reloading. Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1 osd.55 123659 log_to_monitors {default=true} Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over, scheduling restart. Aug 31 15:42:07 systemd: ceph-osd@89.service holdoff time over, scheduling restart. -- Dan On Thu, Aug 31, 2017 at 4:24 PM, Piotr Dzionekwrote: > Hi, > > For a last 3 weeks I have been running latest LTS Luminous Ceph release on > CentOS7. It started with 4th RC and now I have Stable Release. > Cluster runs fine, however I noticed that if I do a reboot of one the nodes, > it takes a really long time for cluster to be in ok status. > Osds are starting up, but not as soon as the server is up. They are up one > by one during a period of 5 minutes. I checked the logs and all osds have > following errors. > > 2017-08-30 15:27:52.541366 7f7dabd0d700 30 Event(0x7f7dbc9f4a80 nevent=5000 > time_id=62).process_events event_wq process is 11 mask is 1 > 2017-08-30 15:51:03.639222 7faf11c3ed00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:51:03.639342 7faf11c3ed00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 3037 > 2017-08-30 15:51:03.672898 7faf11c3ed00 -1 ESC[0;31m ** ERROR: unable to > open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or > directoryESC[0m > 2017-08-30 15:51:42.453334 7f9f55f11d00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:51:42.453352 7f9f55f11d00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 7366 > 2017-08-30 15:51:42.453590 7f9f55f11d00 -1 ESC[0;31m ** ERROR: unable to > open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or > directoryESC[0m > 2017-08-30 15:52:03.199062 7effa00cad00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:52:03.199081 7effa00cad00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 7747 > 2017-08-30 15:52:03.199323 7effa00cad00 -1 ESC[0;31m ** ERROR: unable to > open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or > directoryESC[0m > 2017-08-30 15:52:23.967466 7ff008c2cd00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:52:23.967483 7ff008c2cd00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 8016 > 2017-08-30 15:52:23.967714 7ff008c2cd00 -1 ESC[0;31m ** ERROR: unable to > open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or > directoryESC[0m > 2017-08-30 15:52:44.716646 7fc2bd322d00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:52:44.716664 7fc2bd322d00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 8808 > 2017-08-30 15:52:44.716892 7fc2bd322d00 -1 ESC[0;31m ** ERROR: unable to > open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or > directoryESC[0m > 2017-08-30 15:53:06.214611 7f4583e70d00 0 set uid:gid to 167:167 > (ceph:ceph) > 2017-08-30 15:53:06.214629 7f4583e70d00 0 ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), > pid 9184 > 2017-08-30 15:53:06.214855 7f4583e70d00 -1 ESC[0;31m ** ERROR:
Re: [ceph-users] Very slow start of osds after reboot
Datapoint: I have the same issue on 12.1.1, three nodes, 6 disks per node. On Thu, 31 Aug 2017, Piotr Dzionek said: > For a last 3 weeks I have been running latest LTS Luminous Ceph release on > CentOS7. It started with 4th RC and now I have Stable Release. > Cluster runs fine, however I noticed that if I do a reboot of one the nodes, > it takes a really long time for cluster to be in ok status. > Osds are starting up, but not as soon as the server is up. They are up one > by one during a period of 5 minutes. I checked the logs and all osds have > following errors. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com