[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Package changed: systemd (Ubuntu) => ceph (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in ceph package in Ubuntu: New Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Thanks for all the details. I need to confirm this but I think the block.db and block.wal symlinks are created as a result of 'ceph-volume lvm prepare --bluestore --data --block.wal --block.db '. That's coded in the ceph-osd charm around here: https://opendev.org/openstack/charm-ceph- osd/src/branch/master/lib/ceph/utils.py#L1558 Can you confirm that the symlinks are ok prior to reboot? I'd like to figure out if they are correctly set up by the charm initially. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: New Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
udevadm info -e >/tmp/1828617-2.out ~# ls -l /var/lib/ceph/osd/ceph* -rw--- 1 ceph ceph 69 May 21 08:44 /var/lib/ceph/osd/ceph.client.osd-upgrade.keyring /var/lib/ceph/osd/ceph-11: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-33de740d-bd8c-4b47-a601-3e6e634e489a/osd-block-33de740d-bd8c-4b47-a601-3e6e634e489a -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-18: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-eb5270dc-1110-420f-947e-aab7fae299c9/osd-block-eb5270dc-1110-420f-947e-aab7fae299c9 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-24: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-d38a7e91-cf06-4607-abbe-53eac89ac5ea/osd-block-d38a7e91-cf06-4607-abbe-53eac89ac5ea -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-31: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-053e000a-76ed-427e-98b3-e5373e263f2d/osd-block-053e000a-76ed-427e-98b3-e5373e263f2d lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-053e000a-76ed-427e-98b3-e5373e263f2d lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-38: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-c2669da2-63aa-42e2-b049-cf00a478e076/osd-block-c2669da2-63aa-42e2-b049-cf00a478e076 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-4: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-7478edfc-f321-40a2-a105-8e8a2c8ca3f6/osd-block-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 55 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 2 May 28 22:12 whoami /var/lib/ceph/osd/ceph-45: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e/osd-block-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami ** Attachment added: "1828617-2.out" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+attachment/5267247/+files/1828617-2.out -- You received this bug
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out Hostname obfusticated lsblk: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 88.4M 1 loop /snap/core/6964 loop1 7:1 0 89.4M 1 loop /snap/core/6818 loop2 7:2 0 8.4M 1 loop /snap/canonical-livepatch/77 sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 476M 0 part /boot/efi ├─sda2 8:2 0 3.7G 0 part /boot └─sda3 8:3 0 1.7T 0 part └─bcache7 252:896 0 1.7T 0 disk / sdb 8:160 1.8T 0 disk └─bcache0 252:0 0 1.8T 0 disk sdc 8:320 1.8T 0 disk └─bcache6 252:768 0 1.8T 0 disk └─crypt-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 253:0 0 1.8T 0 crypt └─ceph--7478edfc--f321--40a2--a105--8e8a2c8ca3f6-osd--block--7478edfc--f321--40a2--a105--8e8a2c8ca3f6 253:2 0 1.8T 0 lvm sdd 8:480 1.8T 0 disk └─bcache4 252:512 0 1.8T 0 disk └─crypt-33de740d-bd8c-4b47-a601-3e6e634e489a 253:4 0 1.8T 0 crypt └─ceph--33de740d--bd8c--4b47--a601--3e6e634e489a-osd--block--33de740d--bd8c--4b47--a601--3e6e634e489a 253:5 0 1.8T 0 lvm sde 8:640 1.8T 0 disk └─bcache3 252:384 0 1.8T 0 disk └─crypt-eb5270dc-1110-420f-947e-aab7fae299c9 253:1 0 1.8T 0 crypt └─ceph--eb5270dc--1110--420f--947e--aab7fae299c9-osd--block--eb5270dc--1110--420f--947e--aab7fae299c9 253:3 0 1.8T 0 lvm sdf 8:800 1.8T 0 disk └─bcache1 252:128 0 1.8T 0 disk └─crypt-d38a7e91-cf06-4607-abbe-53eac89ac5ea 253:6 0 1.8T 0 crypt └─ceph--d38a7e91--cf06--4607--abbe--53eac89ac5ea-osd--block--d38a7e91--cf06--4607--abbe--53eac89ac5ea 253:7 0 1.8T 0 lvm sdg 8:960 1.8T 0 disk └─bcache5 252:640 0 1.8T 0 disk └─crypt-053e000a-76ed-427e-98b3-e5373e263f2d 253:8 0 1.8T 0 crypt └─ceph--053e000a--76ed--427e--98b3--e5373e263f2d-osd--block--053e000a--76ed--427e--98b3--e5373e263f2d 253:9 0 1.8T 0 lvm sdh 8:112 0 1.8T 0 disk └─bcache8 252:1024 0 1.8T 0 disk └─crypt-c2669da2-63aa-42e2-b049-cf00a478e076 253:250 1.8T 0 crypt
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Andrey, I don't know if you saw James' comment as yours may have coincided but if you can get the ceph-osd package version that would be helpful. Thanks! -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: New Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Yes, it is latest - the cluster is being re-deployed as part of Bootstack handover. Corey, The bug you point to is fixing the sequence of ceph/udev. Here however udev can't create any devices as they don't exist at the moment of udev run seems so - when the host boots and settles down - there is no PVs exist at all. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Please can you confirm which version of the ceph-osd package you have installed; older versions rely on a charm shipped udev ruleset, rather than it being provided by the packaging. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This feels similar to https://bugs.launchpad.net/charm-ceph- osd/+bug/1812925. First question, are you running with the latest stable charms which have the fix for that bug? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
The ceph-osd package provide udev rules which should switch the owner for all ceph related LVM VG's to ceph:ceph. # OSD LVM layout example # VG prefix: ceph- # LV prefix: osd- ACTION=="add", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="disk", \ ENV{DM_LV_NAME}=="osd-*", \ ENV{DM_VG_NAME}=="ceph-*", \ OWNER:="ceph", GROUP:="ceph", MODE:="660" ACTION=="change", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="disk", \ ENV{DM_LV_NAME}=="osd-*", \ ENV{DM_VG_NAME}=="ceph-*", \ OWNER="ceph", GROUP="ceph", MODE="660" -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
by-dname udev rules are created by MAAS/curtin as part of the server install I think. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Steve, This is MAAS who creates these udev rules. We requested this feature to be implemented in order to be able to use persistent names in further services configuration (using templating). We couldn't go with /dev/sdX names as they may change after the reboot, and can't use wwn names as they are unique per node and don't allow us to use templates with FCB. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
> LVM module is supposed to create PVs from devices using the links in > /dev/disk/by-dname/ > folder that are created by udev. Created by udev how? disk/by-dname is not part of the hierarchy that is populated by the standard udev rules, nor is this created by lvm2. Is there something in the ceph-osd packaging specifically which generates these links - and, in turn, depends on them for assembling LVs? Can you provide udev logs (journalctl --no-pager -lu systemd- udevd.service; udevadm info -e) from the system following a boot when this race is hit? ** Changed in: systemd (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Incomplete Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Just one update, if I change the perms of the symlink made (chown -h) the OSD will actually start. After rebooting, however, I found that the links I had made had gone again and the whole process needed repeating in order to start the OSD. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I'm seeing this in a slightly different manner, on Bionic/Queens. We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently. The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of links to block.db and block.wal on a non-working OSD. See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info. If I made the links manually: cd /var/lib/ceph/osd/ceph-4 ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal This resulted in a perms error accessing the device "bluestore(/var/lib/ceph/osd/ceph-4) _open_db /var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable: (13) Permission denied" ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/ total 0 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15 I tried to change the perms to ceph.ceph ownership, but no change. I have also tried (using `systemctl edit lvm2-monitor.service`) adding the following to lvm2, but that's not changed the behavior either: # cat /etc/systemd/system/lvm2-monitor.service.d/override.conf [Service] ExecStartPre=/bin/sleep 60 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Tags added: canonical-bootstack -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This manifests itself as the following, as reported by lsblk(1). Note the missing Ceph LVM volume on the 6th NVME disk: $ cat sos_commands/block/lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 1.8T 0 disk |-sda1 8:10 512M 0 part /boot/efi `-sda2 8:20 1.8T 0 part |-foobar--vg-root 253:00 1.8T 0 lvm / `-foobar--vg-swap_1 253:10 976M 0 lvm [SWAP] nvme0n1 259:00 1.8T 0 disk `-ceph--c576f63e--dfd4--48f7--9d60--6a7708cbccf6-osd--block--9fdd78b2--0745--47ae--b8d4--04d9803ab448 253:60 1.8T 0 lvm nvme1n1 259:10 1.8T 0 disk `-ceph--6eb6565f--6392--44a8--9213--833b09f7c0bc-osd--block--a7d3629c--724f--4218--9d15--593ec64781da 253:50 1.8T 0 lvm nvme2n1 259:20 1.8T 0 disk `-ceph--c14f9ee5--90d0--4306--9b18--99576516f76a-osd--block--bbf5bc79--edea--4e43--8414--b5140b409397 253:40 1.8T 0 lvm nvme3n1 259:30 1.8T 0 disk `-ceph--a821146b--7674--4bcc--b5e9--0126c4bd5e3b-osd--block--b9371499--ff99--4d3e--ab3f--62ec3cf918c4 253:30 1.8T 0 lvm nvme4n1 259:40 1.8T 0 disk `-ceph--2e39f75a--5d2a--49ee--beb1--5d0a2991fd6c-osd--block--a1be083e--1fa7--4397--acfa--2ff3d3491572 253:20 1.8T 0 lvm nvme5n1 259:50 1.8T 0 disk -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: systemd (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp