Re: [ceph-users] Very slow start of osds after reboot

2017-09-20 Thread Manuel Lausch
Hi, 

I have the same issue with Ceph Jewel (10.2.9), RedHat7 and dmcrypt
Is there any fix or at least a workaround available ?

Regards,
Manuel 


Am Thu, 31 Aug 2017 16:24:10 +0200
schrieb Piotr Dzionek :

> Hi,
> 
> For a last 3 weeks I have been running latest LTS Luminous Ceph
> release on CentOS7. It started with 4th RC and now I have Stable
> Release. Cluster runs fine, however I noticed that if I do a reboot
> of one the nodes, it takes a really long time for cluster to be in ok
> status. Osds are starting up, but not as soon as the server is up.
> They are up one by one during a period of 5 minutes. I checked the
> logs and all osds have following errors.
> 
> 
> 
> As you can see the xfs volume(the part with meta-data) is not mounted 
> yet. My question here, what mounts it and why it takes so long ?
> Maybe there is a setting that randomizes the start up process of osds
> running on the same node?
> 
> Kind regards,
> Piotr Dzionek
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Manuel Lausch

Systemadministrator
Cloud Services

1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 |
76135 Karlsruhe | Germany Phone: +49 721 91374-1847
E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de

Amtsgericht Montabaur, HRB 5452

Geschäftsführer: Thomas Ludwig, Jan Oetjen


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very slow start of osds after reboot

2017-09-01 Thread Piotr Dzionek

Hi,

I have RAID0 for each disks, unfortunately my raid doesn't support JBOD. 
Apart from this I also run separate cluster with Jewel 10.2.9 on RAID0 
and there is no such problem(I just tested it). Moreover, a cluster that 
has this issue used to run Firefly with RAID0 and everything was fine.



W dniu 31.08.2017 o 18:14, Hervé Ballans pisze:

Hi Piotr,

Just to verify one point, how are connected your disks (physically), 
in a NON-RAID or RAID0 mode ?


rv

Le 31/08/2017 à 16:24, Piotr Dzionek a écrit :
For a last 3 weeks I have been running latest LTS Luminous Ceph 
release on CentOS7. It started with 4th RC and now I have Stable 
Release.
Cluster runs fine, however I noticed that if I do a reboot of one the 
nodes, it takes a really long time for cluster to be in ok status.
Osds are starting up, but not as soon as the server is up. They are 
up one by one during a period of 5 minutes. I checked the logs and 
all osds have following errors.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




These are different systemd units  ceph-osd.target and ceph-osd.target 
so they have separate dependencies. So I guess this is nothing weird.



W dniu 31.08.2017 o 17:13, Dan van der Ster pisze:

Random theory... I just noticed that the ceph-osd's are listed twice
[1] in the output of systemctl list-dependencies.

Is that correct?!!!

-- dan

[1] > systemctl list-dependencies
...
● ├─ceph-mds.target
● ├─ceph-mon.target
● ├─ceph-osd.target
● │ ├─ceph-osd@48.service
● │ ├─ceph-osd@49.service
● │ ├─ceph-osd@50.service
● │ ├─ceph-osd@51.service
● │ ├─ceph-osd@53.service
● │ ├─ceph-osd@54.service
● │ ├─ceph-osd@55.service
● │ ├─ceph-osd@56.service
● │ ├─ceph-osd@59.service
● │ ├─ceph-osd@61.service
● │ ├─ceph-osd@63.service
● │ ├─ceph-osd@65.service
● │ ├─ceph-osd@68.service
● │ ├─ceph-osd@70.service
● │ ├─ceph-osd@74.service
● │ ├─ceph-osd@80.service
● │ ├─ceph-osd@81.service
● │ ├─ceph-osd@82.service
● │ ├─ceph-osd@83.service
● │ ├─ceph-osd@84.service
● │ ├─ceph-osd@89.service
● │ ├─ceph-osd@90.service
● │ ├─ceph-osd@91.service
● │ └─ceph-osd@92.service
● ├─ceph.target
● │ ├─ceph-mds.target
● │ ├─ceph-mon.target
● │ └─ceph-osd.target
● │   ├─ceph-osd@48.service
● │   ├─ceph-osd@49.service
● │   ├─ceph-osd@50.service
● │   ├─ceph-osd@51.service
● │   ├─ceph-osd@53.service
● │   ├─ceph-osd@54.service
● │   ├─ceph-osd@55.service
● │   ├─ceph-osd@56.service
● │   ├─ceph-osd@59.service
● │   ├─ceph-osd@61.service
● │   ├─ceph-osd@63.service
● │   ├─ceph-osd@65.service
● │   ├─ceph-osd@68.service
● │   ├─ceph-osd@70.service
● │   ├─ceph-osd@74.service
● │   ├─ceph-osd@80.service
● │   ├─ceph-osd@81.service
● │   ├─ceph-osd@82.service
● │   ├─ceph-osd@83.service
● │   ├─ceph-osd@84.service
● │   ├─ceph-osd@89.service
● │   ├─ceph-osd@90.service
● │   ├─ceph-osd@91.service
● │   └─ceph-osd@92.service
● ├─getty.target
...



I tested it on jewel 10.2.9 and I don't see this issue here.


On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Ster  wrote:

Hi,

I see the same with jewel on el7 -- it started one of the recent point
releases around ~10.2.5, IIRC.

Problem seems to be the same -- daemon is started before the osd is
mounted... then the service waits several seconds before trying again.

Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1
#033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m
Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data
/var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal
Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited,
code=exited, status=1/FAILURE
Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state.
Aug 31 15:41:47 systemd: ceph-osd@89.service failed.
Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount
Aug 31 15:41:47 rc.local: Removed symlink
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:47 rc.local: Created symlink from
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to
/usr/lib/systemd/system/ceph-osd@.service.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1
osd.55 123659 log_to_monitors {default=true}
Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over,

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Hervé Ballans

Hi Piotr,

Just to verify one point, how are connected your disks (physically), in 
a NON-RAID or RAID0 mode ?


rv

Le 31/08/2017 à 16:24, Piotr Dzionek a écrit :
For a last 3 weeks I have been running latest LTS Luminous Ceph 
release on CentOS7. It started with 4th RC and now I have Stable Release.
Cluster runs fine, however I noticed that if I do a reboot of one the 
nodes, it takes a really long time for cluster to be in ok status.
Osds are starting up, but not as soon as the server is up. They are up 
one by one during a period of 5 minutes. I checked the logs and all 
osds have following errors.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
Random theory... I just noticed that the ceph-osd's are listed twice
[1] in the output of systemctl list-dependencies.

Is that correct?!!!

-- dan

[1] > systemctl list-dependencies
...
● ├─ceph-mds.target
● ├─ceph-mon.target
● ├─ceph-osd.target
● │ ├─ceph-osd@48.service
● │ ├─ceph-osd@49.service
● │ ├─ceph-osd@50.service
● │ ├─ceph-osd@51.service
● │ ├─ceph-osd@53.service
● │ ├─ceph-osd@54.service
● │ ├─ceph-osd@55.service
● │ ├─ceph-osd@56.service
● │ ├─ceph-osd@59.service
● │ ├─ceph-osd@61.service
● │ ├─ceph-osd@63.service
● │ ├─ceph-osd@65.service
● │ ├─ceph-osd@68.service
● │ ├─ceph-osd@70.service
● │ ├─ceph-osd@74.service
● │ ├─ceph-osd@80.service
● │ ├─ceph-osd@81.service
● │ ├─ceph-osd@82.service
● │ ├─ceph-osd@83.service
● │ ├─ceph-osd@84.service
● │ ├─ceph-osd@89.service
● │ ├─ceph-osd@90.service
● │ ├─ceph-osd@91.service
● │ └─ceph-osd@92.service
● ├─ceph.target
● │ ├─ceph-mds.target
● │ ├─ceph-mon.target
● │ └─ceph-osd.target
● │   ├─ceph-osd@48.service
● │   ├─ceph-osd@49.service
● │   ├─ceph-osd@50.service
● │   ├─ceph-osd@51.service
● │   ├─ceph-osd@53.service
● │   ├─ceph-osd@54.service
● │   ├─ceph-osd@55.service
● │   ├─ceph-osd@56.service
● │   ├─ceph-osd@59.service
● │   ├─ceph-osd@61.service
● │   ├─ceph-osd@63.service
● │   ├─ceph-osd@65.service
● │   ├─ceph-osd@68.service
● │   ├─ceph-osd@70.service
● │   ├─ceph-osd@74.service
● │   ├─ceph-osd@80.service
● │   ├─ceph-osd@81.service
● │   ├─ceph-osd@82.service
● │   ├─ceph-osd@83.service
● │   ├─ceph-osd@84.service
● │   ├─ceph-osd@89.service
● │   ├─ceph-osd@90.service
● │   ├─ceph-osd@91.service
● │   └─ceph-osd@92.service
● ├─getty.target
...



On Thu, Aug 31, 2017 at 4:57 PM, Dan van der Ster  wrote:
> Hi,
>
> I see the same with jewel on el7 -- it started one of the recent point
> releases around ~10.2.5, IIRC.
>
> Problem seems to be the same -- daemon is started before the osd is
> mounted... then the service waits several seconds before trying again.
>
> Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1
> #033[0;31m ** ERROR: unable to open OSD superblock on
> /var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m
> Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data
> /var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal
> Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited,
> code=exited, status=1/FAILURE
> Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state.
> Aug 31 15:41:47 systemd: ceph-osd@89.service failed.
> Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount
> Aug 31 15:41:47 rc.local: Removed symlink
> /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service.
> Aug 31 15:41:47 systemd: Reloading.
> Aug 31 15:41:47 systemd: Reloading.
> Aug 31 15:41:47 rc.local: Created symlink from
> /etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to
> /usr/lib/systemd/system/ceph-osd@.service.
> Aug 31 15:41:47 systemd: Reloading.
> Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1
> osd.55 123659 log_to_monitors {default=true}
> Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over,
> scheduling restart.
> Aug 31 15:42:07 systemd: ceph-osd@89.service holdoff time over,
> scheduling restart.
>
>
> -- Dan
>
>
>
> On Thu, Aug 31, 2017 at 4:24 PM, Piotr Dzionek  wrote:
>> Hi,
>>
>> For a last 3 weeks I have been running latest LTS Luminous Ceph release on
>> CentOS7. It started with 4th RC and now I have Stable Release.
>> Cluster runs fine, however I noticed that if I do a reboot of one the nodes,
>> it takes a really long time for cluster to be in ok status.
>> Osds are starting up, but not as soon as the server is up. They are up one
>> by one during a period of 5 minutes. I checked the logs and all osds have
>> following errors.
>>
>> 2017-08-30 15:27:52.541366 7f7dabd0d700 30 Event(0x7f7dbc9f4a80 nevent=5000
>> time_id=62).process_events event_wq process is 11 mask is 1
>> 2017-08-30 15:51:03.639222 7faf11c3ed00  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-08-30 15:51:03.639342 7faf11c3ed00  0 ceph version 12.2.0
>> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
>> pid 3037
>> 2017-08-30 15:51:03.672898 7faf11c3ed00 -1 ESC[0;31m ** ERROR: unable to
>> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
>> directoryESC[0m
>> 2017-08-30 15:51:42.453334 7f9f55f11d00  0 set uid:gid to 167:167
>> 

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Dan van der Ster
Hi,

I see the same with jewel on el7 -- it started one of the recent point
releases around ~10.2.5, IIRC.

Problem seems to be the same -- daemon is started before the osd is
mounted... then the service waits several seconds before trying again.

Aug 31 15:41:47 ceph-osd: 2017-08-31 15:41:47.267661 7f2e49731800 -1
#033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-89: (2) No such file or directory#033[0m
Aug 31 15:41:47 ceph-osd: starting osd.55 at :/0 osd_data
/var/lib/ceph/osd/ceph-55 /var/lib/ceph/osd/ceph-55/journal
Aug 31 15:41:47 systemd: ceph-osd@89.service: main process exited,
code=exited, status=1/FAILURE
Aug 31 15:41:47 systemd: Unit ceph-osd@89.service entered failed state.
Aug 31 15:41:47 systemd: ceph-osd@89.service failed.
Aug 31 15:41:47 kernel: XFS (sdi1): Ending clean mount
Aug 31 15:41:47 rc.local: Removed symlink
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:47 rc.local: Created symlink from
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@54.service to
/usr/lib/systemd/system/ceph-osd@.service.
Aug 31 15:41:47 systemd: Reloading.
Aug 31 15:41:55 ceph-osd: 2017-08-31 15:41:55.425566 7f74b92e1800 -1
osd.55 123659 log_to_monitors {default=true}
Aug 31 15:42:07 systemd: ceph-osd@84.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@61.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@83.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@80.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@70.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@65.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@82.service holdoff time over,
scheduling restart.
Aug 31 15:42:07 systemd: ceph-osd@89.service holdoff time over,
scheduling restart.


-- Dan



On Thu, Aug 31, 2017 at 4:24 PM, Piotr Dzionek  wrote:
> Hi,
>
> For a last 3 weeks I have been running latest LTS Luminous Ceph release on
> CentOS7. It started with 4th RC and now I have Stable Release.
> Cluster runs fine, however I noticed that if I do a reboot of one the nodes,
> it takes a really long time for cluster to be in ok status.
> Osds are starting up, but not as soon as the server is up. They are up one
> by one during a period of 5 minutes. I checked the logs and all osds have
> following errors.
>
> 2017-08-30 15:27:52.541366 7f7dabd0d700 30 Event(0x7f7dbc9f4a80 nevent=5000
> time_id=62).process_events event_wq process is 11 mask is 1
> 2017-08-30 15:51:03.639222 7faf11c3ed00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:51:03.639342 7faf11c3ed00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 3037
> 2017-08-30 15:51:03.672898 7faf11c3ed00 -1 ESC[0;31m ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
> directoryESC[0m
> 2017-08-30 15:51:42.453334 7f9f55f11d00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:51:42.453352 7f9f55f11d00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 7366
> 2017-08-30 15:51:42.453590 7f9f55f11d00 -1 ESC[0;31m ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
> directoryESC[0m
> 2017-08-30 15:52:03.199062 7effa00cad00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:52:03.199081 7effa00cad00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 7747
> 2017-08-30 15:52:03.199323 7effa00cad00 -1 ESC[0;31m ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
> directoryESC[0m
> 2017-08-30 15:52:23.967466 7ff008c2cd00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:52:23.967483 7ff008c2cd00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 8016
> 2017-08-30 15:52:23.967714 7ff008c2cd00 -1 ESC[0;31m ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
> directoryESC[0m
> 2017-08-30 15:52:44.716646 7fc2bd322d00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:52:44.716664 7fc2bd322d00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 8808
> 2017-08-30 15:52:44.716892 7fc2bd322d00 -1 ESC[0;31m ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-27: (2) No such file or
> directoryESC[0m
> 2017-08-30 15:53:06.214611 7f4583e70d00  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-08-30 15:53:06.214629 7f4583e70d00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown),
> pid 9184
> 2017-08-30 15:53:06.214855 7f4583e70d00 -1 ESC[0;31m ** ERROR: 

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Sean Purdy
Datapoint: I have the same issue on 12.1.1, three nodes, 6 disks per node.

On Thu, 31 Aug 2017, Piotr Dzionek said:
> For a last 3 weeks I have been running latest LTS Luminous Ceph release on
> CentOS7. It started with 4th RC and now I have Stable Release.
> Cluster runs fine, however I noticed that if I do a reboot of one the nodes,
> it takes a really long time for cluster to be in ok status.
> Osds are starting up, but not as soon as the server is up. They are up one
> by one during a period of 5 minutes. I checked the logs and all osds have
> following errors.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com