Thanks for testing. That should rule out udev as the cause of the race.
A couple of observations from the log:
* There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times
until the OSD is activated, for example for 4:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO ] Running command:
ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,435][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.4 with fsid
7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit
status: 1
[2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries
left: 30
[2019-05-31 01:27:44,122][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.4 with fsid
7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit
status: 1
[2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries
left: 29
...
I wonder if we can have similar 'ceph-volume lvm trigger' calls for WAL
and DB devices per OSD. Does that even make sense? Or perhaps another
call with a similar goal. We should be able to determine if an OSD has a
DB or WAL device from the lvm tags.
* The first 3 osd's that are activated are 18, 4, and 11 and they are the 3
that are missing block.db/block.wal symlinks. That's just more confirmation
this is a race:
[2019-05-31 01:28:03,370][systemd][INFO ] successfully trggered activation
for: 18-eb5270dc-1110-420f-947e-aab7fae299c9
[2019-05-31 01:28:12,354][systemd][INFO ] successfully trggered activation
for: 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:28:12,530][systemd][INFO ] successfully trggered activation
for: 11-33de740d-bd8c-4b47-a601-3e6e634e489a
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs