I'm seeing this in a slightly different manner, on Bionic/Queens.

We have LVMs encrypted (thanks Vault), and rebooting a host results in
at least one OSD not returning fairly consistently.  The LVs appear in
the list, however the difference between a working and a non-working OSD
is the lack of links to block.db and block.wal on a non-working OSD.

See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info.

If I made the links manually:

cd /var/lib/ceph/osd/ceph-4
ln -s 
/dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
 block.db
ln -s 
/dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
 block.wal

This resulted in a perms error accessing the device
"bluestore(/var/lib/ceph/osd/ceph-4) _open_db
/var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable:
(13) Permission denied"

ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/
total 0
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21
lrwxrwxrwx 1 root root 8 May 22 23:04 
osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 
osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15

I tried to change the perms to ceph.ceph ownership, but no change.

I have also tried (using `systemctl edit lvm2-monitor.service`) adding
the following to lvm2, but that's not changed the behavior either:

# cat /etc/systemd/system/lvm2-monitor.service.d/override.conf 
[Service]
ExecStartPre=/bin/sleep 60

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1828617

Title:
  Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Ubuntu 18.04.2 Ceph deployment.

  Ceph OSD devices utilizing LVM volumes pointing to udev-based physical 
devices.
  LVM module is supposed to create PVs from devices using the links in 
/dev/disk/by-dname/ folder that are created by udev.
  However on reboot it happens (not always, rather like race condition) that 
Ceph services cannot start, and pvdisplay doesn't show any volumes created. The 
folder /dev/disk/by-dname/ however has all necessary device created by the end 
of boot process.

  The behaviour can be fixed manually by running "#/sbin/lvm pvscan
  --cache --activate ay /dev/nvme0n1" command for re-activating the LVM
  components and then the services can be started.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to