[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Changed in: cloud-archive Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3~cloud0 --- ceph (12.2.12-0ubuntu0.18.04.3~cloud0) xenial-queens; urgency=medium . * New update for the Ubuntu Cloud Archive. . ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium . [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). . [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). . [ James Page ] * d/p/rgw-gc-use-aio.patch: Cherry pick fix to switch to using AIO for garbage collection of objects in the Ceph RADOS Gateway (LP: #1838858). . [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. ** Changed in: cloud-archive/queens Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.18.10.3~cloud0 --- ceph (13.2.6-0ubuntu0.18.10.3~cloud0) bionic; urgency=medium . [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. . [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). . [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). ** Changed in: cloud-archive/rocky Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4~cloud0 --- ceph (13.2.6-0ubuntu0.19.04.4~cloud0) bionic-stein; urgency=medium . * New update for the Ubuntu Cloud Archive. . ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium . [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. . [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). . [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). ** Changed in: cloud-archive/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Changed in: cloud-archive/train Status: In Progress => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3 --- ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). [ James Page ] * d/p/rgw-gc-use-aio.patch: Cherry pick fix to switch to using AIO for garbage collection of objects in the Ceph RADOS Gateway (LP: #1838858). [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. -- James Page Fri, 30 Aug 2019 10:11:09 +0100 ** Changed in: ceph (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4 --- ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). -- James Page Fri, 30 Aug 2019 10:10:04 +0100 ** Changed in: ceph (Ubuntu Disco) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Verification completed for bionic-rocky-proposed $ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.18.10.3~cloud0 Candidate: 13.2.6-0ubuntu0.18.10.3~cloud0 Version table: *** 13.2.6-0ubuntu0.18.10.3~cloud0 500 500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-proposed/rocky/main amd64 Packages 100 /var/lib/dpkg/status 12.2.12-0ubuntu0.18.04.2 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 12.2.4-0ubuntu1 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 Packages ** Tags removed: verification-rocky-needed ** Tags added: verification-rocky-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Verification completed on bionic-stein-proposed: $ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.19.04.4~cloud0 Candidate: 13.2.6-0ubuntu0.19.04.4~cloud0 Version table: *** 13.2.6-0ubuntu0.19.04.4~cloud0 500 500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-proposed/stein/main amd64 Packages 100 /var/lib/dpkg/status 12.2.12-0ubuntu0.18.04.2 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 12.2.4-0ubuntu1 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 Packages Rebooted machines 10 x times with no reproduction of the issue. ** Tags removed: verification-stein-needed ** Tags added: verification-stein-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
$ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.19.04.4 Candidate: 13.2.6-0ubuntu0.19.04.4 Version table: *** 13.2.6-0ubuntu0.19.04.4 500 500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages 100 /var/lib/dpkg/status 13.2.6-0ubuntu0.19.04.3 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu disco-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu disco-security/main amd64 Packages 13.2.4+dfsg1-0ubuntu2 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu disco/main amd64 Packages disco-proposed tested with a deployment using separate db and wal devices; OSD's restarted reliably over 10 x reboot iterations across three machines. $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:00 88.7M 1 loop /snap/core/7396 loop1 7:10 54.5M 1 loop loop2 7:20 89M 1 loop /snap/core/7713 loop3 7:30 54.6M 1 loop /snap/lxd/11964 loop4 7:40 54.6M 1 loop /snap/lxd/11985 vda 252:00 20G 0 disk ├─vda1252:10 19.9G 0 part / ├─vda14 252:14 04M 0 part └─vda15 252:15 0 106M 0 part /boot/efi vdb 252:16 0 40G 0 disk /mnt vdc 252:32 0 10G 0 disk └─ceph--683a8389--9788--4fd5--b59e--bdd69936a768-osd--block--683a8389--9788--4fd5--b59e--bdd69936a768 253:00 10G 0 lvm vdd 252:48 0 10G 0 disk └─ceph--1fd8022f--e851--4cfa--82aa--64693510c705-osd--block--1fd8022f--e851--4cfa--82aa--64693510c705 253:60 10G 0 lvm vde 252:64 0 10G 0 disk └─ceph--302bafc8--9981--47a3--b66b--3d84ab550ba5-osd--block--302bafc8--9981--47a3--b66b--3d84ab550ba5 253:30 10G 0 lvm vdf 252:80 05G 0 disk ├─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--683a8389--9788--4fd5--b59e--bdd69936a768 │ 253:201G 0 lvm ├─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--302bafc8--9981--47a3--b66b--3d84ab550ba5 │ 253:501G 0 lvm └─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--1fd8022f--e851--4cfa--82aa--64693510c705 253:801G 0 lvm vdg 252:96 05G 0 disk ├─ceph--wal--40c6b471--3ba2--41d5--9215--eabf391499de-osd--wal--683a8389--9788--4fd5--b59e--bdd69936a768 │ 253:10 96M 0 lvm ├─ceph--wal--40c6b471--3ba2--41d5--9215--eabf391499de-osd--wal--302bafc8--9981--47a3--b66b--3d84ab550ba5 │ 253:40 96M 0 lvm └─ceph--wal--40c6b471--3ba2--41d5--9215--eabf391499de-osd--wal--1fd8022f--e851--4cfa--82aa--64693510c705 253:70 96M 0 lvm ** Tags removed: verification-needed verification-needed-disco ** Tags added: verification-done verification-done-disco -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
bionic-proposed tested with a deployment using separate db and wal devices; OSD's restarted reliably over 10 x reboot iterations across three machines. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Hello Andrey, or anyone else affected, Accepted ceph into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/13.2.6-0ubuntu0.19.04.4 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: ceph (Ubuntu Disco) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-disco ** Changed in: ceph (Ubuntu Bionic) Status: In Progress => Fix Committed ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** No longer affects: cloud-archive/pike ** Changed in: cloud-archive/train Status: New => In Progress ** Changed in: cloud-archive/stein Status: New => In Progress ** Changed in: cloud-archive/rocky Status: New => In Progress ** Changed in: cloud-archive/queens Status: New => In Progress ** Changed in: cloud-archive/queens Importance: Undecided => High ** Changed in: cloud-archive/rocky Importance: Undecided => High ** Changed in: cloud-archive/stein Importance: Undecided => High ** Changed in: cloud-archive/train Importance: Undecided => High ** CVE removed: https://cve.mitre.org/cgi- bin/cvename.cgi?name=2019-10222 ** Changed in: cloud-archive/train Assignee: (unassigned) => James Page (james-page) ** Changed in: cloud-archive/stein Assignee: (unassigned) => James Page (james-page) ** Changed in: cloud-archive/rocky Assignee: (unassigned) => James Page (james-page) ** Changed in: cloud-archive/queens Assignee: (unassigned) => James Page (james-page) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This bug was fixed in the package ceph - 14.2.2-0ubuntu2 --- ceph (14.2.2-0ubuntu2) eoan; urgency=medium [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade" instead of "--no-restart-on-upgrade". - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts. [ Jesse Williamson ] * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway deployments (LP: #1838109). [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before activating OSD's (LP: #1828617). [ Steve Beattie ] * SECURITY UPDATE: RADOS gateway remote denial of service - d/p/CVE-2019-10222.patch: rgw: asio: check the remote endpoint before processing requests. - CVE-2019-10222 -- James Page Thu, 29 Aug 2019 13:54:25 +0100 ** Changed in: ceph (Ubuntu Eoan) Status: In Progress => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-10222 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Changed in: ceph (Ubuntu Bionic) Status: New => In Progress ** Changed in: ceph (Ubuntu Disco) Status: New => In Progress ** Changed in: ceph (Ubuntu Disco) Assignee: (unassigned) => James Page (james-page) ** Changed in: ceph (Ubuntu Bionic) Assignee: (unassigned) => James Page (james-page) ** Changed in: ceph (Ubuntu Bionic) Importance: Undecided => High ** Changed in: ceph (Ubuntu Disco) Importance: Undecided => High ** Description changed: + [Impact] + For deployments where the bluestore DB and WAL devices are on separate underlying OSD's, its possible on reboot that the LV's configured on these devices have not yet been scanned and detected; the OSD boot process ignores this fact and tries to boot the OSD anyway as soon as the primary LV supporting the OSD is detected, resulting in the OSD crashing as required block device symlinks are not present. + + [Test Case] + Deploy ceph with bluestore + separate DB and WAL devices. + Reboot servers + OSD devices will fail to start after reboot (its a race so not always). + + [Regression Potential] + Low - the fix has been landed upstream and simple ensures that if a separate LV is expected for the DB and WAL devices for an OSD, the OSD will not try to boot until they are present. + + [Original Bug Report] Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/stein Importance: Undecided Status: New ** Also affects: cloud-archive/queens Importance: Undecided Status: New ** Also affects: cloud-archive/pike Importance: Undecided Status: New ** Also affects: cloud-archive/train Importance: Undecided Status: New ** Also affects: cloud-archive/rocky Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Merge proposal linked: https://code.launchpad.net/~slashd/ubuntu/+source/ceph/+git/ceph/+merge/371549 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Also affects: ceph (Ubuntu Eoan) Importance: High Assignee: James Page (james-page) Status: In Progress ** Also affects: ceph (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: ceph (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Alternative fix proposed upstream - picking this in preference to Corey's fix as its in the right part of the codebase for ceph-volume. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Alternative fix: https://github.com/ceph/ceph/pull/28791 ** Changed in: ceph (Ubuntu) Assignee: Corey Bryant (corey.bryant) => James Page (james-page) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Building in ppa:ci-train-ppa-service/3535 (will take a few hours). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
@David, thanks for the update. We could really use some testing of the current proposed fix if you have a chance. That's in a PPA mentioned above. The new code will wait for wal/db devices to arrive and has env vars to adjust wait times - http://docs.ceph.com/docs/mimic/ceph- volume/systemd/#failure-and-retries. As for the pvscan issue, I don't think that is related to ceph. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Just adding that I've worked around this issue with the following added to the lvm2-monitor overrides (/etc/systemd/system/lvm2-monitor.service.d/custom.conf): [Service] ExecStartPre=/bin/sleep 60 This results in 100% success for every single boot, with no missed disks nor missed LVM volumes applied to those block devices. We've also disabled nvme multipathing on every Ceph storage node with the following in /etc/d/g kernel boot args: nvme_core.multipath=0 Note: This LP was cloned from an internal customer case where their Ceph storage nodes were directly impacted by this issue, and this is the current workaround deployed, until/unless we can find a consistent RC for this issue in an upstream package. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Py2 bug found in code review upstream. Updated PPA again with fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Changed in: ceph (Ubuntu) Importance: Critical => High ** Changed in: ceph (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Note that the code looks for wal/db devices in the block device's LV tags after it is found. In other words: sudo lvs -o lv_tags | grep type=block | grep ceph.wal_device sudo lvs -o lv_tags | grep type=block | grep ceph.db_device This is the window where the following might not yet exist, yet we know they *should* exist based on the above tags: sudo lvs -o lv_tags | grep type=wal sudo lvs -o lv_tags | grep type=db -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I chatted with xav in IRC and he showed me a private link to the log files. The ceph-volume-systemd.log.1 had timestamps of 2019-06-03 which matches up with the last attempt (see comment #37). I didn't find any logs from the new code in this log file. That likely means one of the following: there were no wal/db devices found in lvs tags (ie. 'sudo lvs -o lv_tags'), the new code isn't working, or the new code wasn't installed. I added a few more logs to the patch help understand better what's going on, and that's rebuilding in the PPA. I'm attaching all the relevant code to show the log messages to look for. ** Attachment added: "snippet from src/ceph-volume/ceph_volume/systemd/main.py" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+attachment/5269330/+files/main.py -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Any chance the log files got rotated and zipped? What does an ls of /var/log/ceph show? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
The pvscan issue is likely something different, just wanted to make sure folks are aware of it for completeness. The logs /var/log/ceph/ceph-volume-systemd.log and ceph-volume.log are empty. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Do you have access to the /var/log/ceph/ceph-volume-systemd.log after the latest reboot? That should give us some details such as: "[2019-05-31 20:43:44,334][systemd][WARNING] failed to find db volume, retries left: 17" or similar for wal volume. If you see that the retries have been exceeded in your case you can tune them (the new loops are using the same env vars): http://docs.ceph.com/docs/mimic/ceph-volume/systemd/#failure-and-retries As for the pvscan issue, I'm not sure if that is a ceph issue (?). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Let me word that last comment differently. I went to the host and installed the PPA update, then rebooted. When the box booted up, the PV which hosts the wal LVs wasn't listed in lsblk or 'pvs' or lvs. I then ran pvscan --cache, which brought the LVs back online, but not the OSDs, so I rebooted. After that reboot, the behavior of the OSDs was exactly the same as prior to the update - I reboot, and some OSDs don't come online, and are missing symlinks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
After installing that PPA update and rebooting, the PV for the wal didn't come online till I ran pvscan --cache. Seems a second reboot didn't do that though, might have been a red herring from prior attempts. Unfortunately, the OSDs didn't seem to come online in exactly the same way after installing the update. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I've cherry-picked that patch to the package in the PPA if anyone can test. I'm fairly sure this will fix it as I've been testing and removing/adding the volume backed storage in my testing environment and it will wait for the wal/db devices for a while if they don't exist. ** Changed in: ceph (Ubuntu) Status: New => Triaged ** Changed in: ceph (Ubuntu) Importance: Undecided => Critical ** Changed in: ceph (Ubuntu) Assignee: (unassigned) => Corey Bryant (corey.bryant) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Upstream pull request: https://github.com/ceph/ceph/pull/28357 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Upstream ceph bug opened: https://tracker.ceph.com/issues/40100 ** Bug watch added: tracker.ceph.com/issues #40100 http://tracker.ceph.com/issues/40100 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
The 'ceph-volume lvm trigger' call appears to come from ceph source at src/ceph-volume/ceph_volume/systemd/main.py. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Thanks for testing. That should rule out udev as the cause of the race. A couple of observations from the log: * There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example for 4: [2019-05-31 01:27:29,235][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 [2019-05-31 01:27:35,435][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6 [2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit status: 1 [2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries left: 30 [2019-05-31 01:27:44,122][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6 [2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit status: 1 [2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries left: 29 ... I wonder if we can have similar 'ceph-volume lvm trigger' calls for WAL and DB devices per OSD. Does that even make sense? Or perhaps another call with a similar goal. We should be able to determine if an OSD has a DB or WAL device from the lvm tags. * The first 3 osd's that are activated are 18, 4, and 11 and they are the 3 that are missing block.db/block.wal symlinks. That's just more confirmation this is a race: [2019-05-31 01:28:03,370][systemd][INFO ] successfully trggered activation for: 18-eb5270dc-1110-420f-947e-aab7fae299c9 [2019-05-31 01:28:12,354][systemd][INFO ] successfully trggered activation for: 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 [2019-05-31 01:28:12,530][systemd][INFO ] successfully trggered activation for: 11-33de740d-bd8c-4b47-a601-3e6e634e489a -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Hi, Added the udevadm settle --timeout=5 in both the 2 remaining if block's in the referenced script. That did not make a difference. See https://pastebin.ubuntu.com/p/8f2ZXMRNgv/ for the ceph-volume- systemd.log At this boot, the osd's with numbers 4, 11 & 18 did not start, with the missing symlinks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Note that there may only be a short window during system startup to catch missing tags with 'sudo lvs -o lv_tags'. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Some further references: Each part of the OSD is queried for its underlying block device using blkid: https://github.com/ceph/ceph/blob/luminous/src/ceph- volume/ceph_volume/devices/lvm/activate.py#L114 I guess that if the block device was not visible/present at the point that code runs during activate, then the symlink for the block.db or block.wal devices would not be created, causing the OSD to fail to start. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Referenced from: https://github.com/ceph/ceph/blob/luminous/src/ceph- volume/ceph_volume/devices/lvm/activate.py#L154 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
The ceph-volume tool assembles and primes the OSD directory using the LV tags written during the prepare action - it would be good to validate these are OK with 'sudo lvs -o lv_tags' The tags will contain UUID information about all of the block devices associated with an OSD. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Any output in /var/log/ceph/ceph-volume-systemd.log would also be useful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
@Wouter, Thanks for testing. I'm rebuilding the package without the checks as they're probably preventing the udevadm settle from running. In the new build the 'udevadm settle --timeout=5' will run regardless. Let's see if that helps and then we can fine tune the checks surrounding the call later. Would you mind trying again once that builds (same PPA)? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
@Wouter, since ceph takes so long to build you could also manually add 'udevadm settle --timeout=5' to /usr/lib/ceph/ceph-osd-prestart.sh across the ceph-osd units to test that. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Hi, Installed the packages from the above ppa, rebooted the host and 4 out of 7 osd's came up. The 3 that where missing from the `ceph osd tree` where not running the osd daemon as they lacked the symlinks to the db and the wal. Rebooted the server, and after the reboot other osd's (again 3 out of 7) failed to start due to missing symlinks. This time it where other osd's. So the issue is not fixed with the deb's in the ppa. Regards, Wouter -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Thanks, will do. FWIW, the symlinks are in place before reboot. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I'm building a test package for ceph with additional logic added to /usr/lib/ceph/ceph-osd-prestart.sh to allow block.wal and block.db additional time to settle. This is just a version to test the fix. I'm not sure if the behavior is the same as journal file (symlink exists but file doesn't) but that's what I have in this change. Here's the PPA: https://launchpad.net/~corey.bryant/+archive/ubuntu/bionic- queens-1828617/+packages Xav, Any chance you could try this out once it builds? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I didn't recreate this but I did get a deployment on serverstack with bluestore WAL and DB devices. That's done with: 1) juju deploy --series bionic --num-units 1 --constraints mem=2G --config expected-osd-count=1 --config monitor-count=1 cs:ceph-mon ceph- mon 2) juju deploy --series bionic --num-units 1 --constraints mem=2G --storage osd-devices=cinder,10G --storage bluestore-wal=cinder,1G --storage bluestore-db=cinder,1G cs:ceph-osd ceph-osd 3) juju add-relation ceph-osd ceph-mon James Page mentioned taking a look at the systemd bits. ceph-osd systemd unit - /lib/systemd/system/ceph-osd@.service calls: ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i Where /usr/lib/ceph/ceph-osd-prestart.sh has some logic that exits with an error code when certain things aren't ready. I think we might be able to add something in there. For example it currently has: data="/var/lib/ceph/osd/${cluster:-ceph}-$id" if [ -L "$journal" -a ! -e "$journal" ]; then udevadm settle --timeout=5 || : if [ -L "$journal" -a ! -e "$journal" ]; then echo "ceph-osd(${cluster:-ceph}-$id): journal not present, not starting yet." 1>&2 exit 0 fi fi The 'udevadm settle' watches the udev event queue and exists if all current events are handled or if it's been 5 seconds. Perhaps we can do something similar for this issue. Here's what I see in /var/log/ceph/ceph-osd.0.log during a system reboot: - 2019-05-29 19:04:25.800237 7fa6940d1700 1 freelist shutdown ... 2019-05-29 19:04:25.800548 7fa6940d1700 1 bdev(0x557eca7a1680 /var/lib/ceph/osd/ceph-0/block.wal) close 2019-05-29 19:04:26.079227 7fa6940d1700 1 bdev(0x557eca7a1200 /var/lib/ceph/osd/ceph-0/block.db) close 2019-05-29 19:04:26.266085 7fa6940d1700 1 bdev(0x557eca7a1440 /var/lib/ceph/osd/ceph-0/block) close 2019-05-29 19:04:26.474086 7fa6940d1700 1 bdev(0x557eca7a0fc0 /var/lib/ceph/osd/ceph-0/block) close ... 2019-05-29 19:04:53.601570 7fdd2ec17e40 1 bdev create path /var/lib/ceph/osd/ceph-0/block.db type kernel 2019-05-29 19:04:53.601581 7fdd2ec17e40 1 bdev(0x561e50583200 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db 2019-05-29 19:04:53.601855 7fdd2ec17e40 1 bdev(0x561e50583200 /var/lib/ceph/osd/ceph-0/block.db) open size 1073741824 (0x4000, 1GiB) block_size 4096 (4KiB) rotational 2019-05-29 19:04:53.601867 7fdd2ec17e40 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 1GiB 2019-05-29 19:04:53.602131 7fdd2ec17e40 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2019-05-29 19:04:53.602143 7fdd2ec17e40 1 bdev(0x561e50583440 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2019-05-29 19:04:53.602464 7fdd2ec17e40 1 bdev(0x561e50583440 /var/lib/ceph/osd/ceph-0/block) open size 10733223936 (0x27fc0, 10.0GiB) block_size 4096 (4KiB) rotational 2019-05-29 19:04:53.602480 7fdd2ec17e40 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 10.0GiB 2019-05-29 19:04:53.602499 7fdd2ec17e40 1 bdev create path /var/lib/ceph/osd/ceph-0/block.wal type kernel 2019-05-29 19:04:53.602502 7fdd2ec17e40 1 bdev(0x561e50583680 /var/lib/ceph/osd/ceph-0/block.wal) open path /var/lib/ceph/osd/ceph-0/block.wal 2019-05-29 19:04:53.602709 7fdd2ec17e40 1 bdev(0x561e50583680 /var/lib/ceph/osd/ceph-0/block.wal) open size 100663296 (0x600, 96MiB) block_size 4096 (4KiB) rotational 2019-05-29 19:04:53.602717 7fdd2ec17e40 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 96MiB ... -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Couple typos in comment #19: I think bluestore-wal and bluestore-db needed 2G. Also s/exists/exits -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Package changed: systemd (Ubuntu) => ceph (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: ceph (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Thanks for all the details. I need to confirm this but I think the block.db and block.wal symlinks are created as a result of 'ceph-volume lvm prepare --bluestore --data --block.wal --block.db '. That's coded in the ceph-osd charm around here: https://opendev.org/openstack/charm-ceph- osd/src/branch/master/lib/ceph/utils.py#L1558 Can you confirm that the symlinks are ok prior to reboot? I'd like to figure out if they are correctly set up by the charm initially. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out Hostname obfusticated lsblk: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 88.4M 1 loop /snap/core/6964 loop1 7:1 0 89.4M 1 loop /snap/core/6818 loop2 7:2 0 8.4M 1 loop /snap/canonical-livepatch/77 sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 476M 0 part /boot/efi ├─sda2 8:2 0 3.7G 0 part /boot └─sda3 8:3 0 1.7T 0 part └─bcache7 252:896 0 1.7T 0 disk / sdb 8:160 1.8T 0 disk └─bcache0 252:0 0 1.8T 0 disk sdc 8:320 1.8T 0 disk └─bcache6 252:768 0 1.8T 0 disk └─crypt-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 253:0 0 1.8T 0 crypt └─ceph--7478edfc--f321--40a2--a105--8e8a2c8ca3f6-osd--block--7478edfc--f321--40a2--a105--8e8a2c8ca3f6 253:2 0 1.8T 0 lvm sdd 8:480 1.8T 0 disk └─bcache4 252:512 0 1.8T 0 disk └─crypt-33de740d-bd8c-4b47-a601-3e6e634e489a 253:4 0 1.8T 0 crypt └─ceph--33de740d--bd8c--4b47--a601--3e6e634e489a-osd--block--33de740d--bd8c--4b47--a601--3e6e634e489a 253:5 0 1.8T 0 lvm sde 8:640 1.8T 0 disk └─bcache3 252:384 0 1.8T 0 disk └─crypt-eb5270dc-1110-420f-947e-aab7fae299c9 253:1 0 1.8T 0 crypt └─ceph--eb5270dc--1110--420f--947e--aab7fae299c9-osd--block--eb5270dc--1110--420f--947e--aab7fae299c9 253:3 0 1.8T 0 lvm sdf 8:800 1.8T 0 disk └─bcache1 252:128 0 1.8T 0 disk └─crypt-d38a7e91-cf06-4607-abbe-53eac89ac5ea 253:6 0 1.8T 0 crypt └─ceph--d38a7e91--cf06--4607--abbe--53eac89ac5ea-osd--block--d38a7e91--cf06--4607--abbe--53eac89ac5ea 253:7 0 1.8T 0 lvm sdg 8:960 1.8T 0 disk └─bcache5 252:640 0 1.8T 0 disk └─crypt-053e000a-76ed-427e-98b3-e5373e263f2d 253:8 0 1.8T 0 crypt └─ceph--053e000a--76ed--427e--98b3--e5373e263f2d-osd--block--053e000a--76ed--427e--98b3--e5373e263f2d 253:9 0 1.8T 0 lvm sdh 8:112 0 1.8T 0 disk └─bcache8 252:1024 0 1.8T 0 disk └─crypt-c2669da2-63aa-42e2-b049-cf00a478e076 253:250 1.8T 0 crypt └─ceph--c2669da2--63aa--42e2--b049--cf00a478e076-osd--block--c2669da2--63aa--42
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
udevadm info -e >/tmp/1828617-2.out ~# ls -l /var/lib/ceph/osd/ceph* -rw--- 1 ceph ceph 69 May 21 08:44 /var/lib/ceph/osd/ceph.client.osd-upgrade.keyring /var/lib/ceph/osd/ceph-11: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-33de740d-bd8c-4b47-a601-3e6e634e489a/osd-block-33de740d-bd8c-4b47-a601-3e6e634e489a -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-18: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-eb5270dc-1110-420f-947e-aab7fae299c9/osd-block-eb5270dc-1110-420f-947e-aab7fae299c9 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-24: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-d38a7e91-cf06-4607-abbe-53eac89ac5ea/osd-block-d38a7e91-cf06-4607-abbe-53eac89ac5ea -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-31: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-053e000a-76ed-427e-98b3-e5373e263f2d/osd-block-053e000a-76ed-427e-98b3-e5373e263f2d lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-053e000a-76ed-427e-98b3-e5373e263f2d lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-38: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-c2669da2-63aa-42e2-b049-cf00a478e076/osd-block-c2669da2-63aa-42e2-b049-cf00a478e076 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami /var/lib/ceph/osd/ceph-4: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-7478edfc-f321-40a2-a105-8e8a2c8ca3f6/osd-block-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 55 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 2 May 28 22:12 whoami /var/lib/ceph/osd/ceph-45: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e/osd-block-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -rw--- 1 ceph ceph 37 May 28 22:12 ceph_fsid -rw--- 1 ceph ceph 37 May 28 22:12 fsid -rw--- 1 ceph ceph 56 May 28 22:12 keyring -rw--- 1 ceph ceph 6 May 28 22:12 ready -rw--- 1 ceph ceph 10 May 28 22:12 type -rw--- 1 ceph ceph 3 May 28 22:12 whoami ** Attachment added: "1828617-2.out" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+attachment/5267247/+files/1828617-2.out -- You received this bug
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Charm is cs:ceph-osd-284 Ceph version is 12.2.11-0ubuntu0.18.04.2 The udev rules are created by curtin during the maas install. Here's an example udev rule: cat bcache4.rules # Written by curtin SUBSYSTEM=="block", ACTION=="add|change", ENV{CACHED_UUID}=="7b0e872b-ac78-4c4e-af18-8ccdce5962f6", SYMLINK+="disk/by-dname/bcache4" The problem here is that when the host boots, for some OSDs (random, changes each boot), there's no symlinks for block.db and block.wal in /var/lib/ceph/osd/ceph-${thing}. If I manually create those two symlinks (and make sure the perms are right for the links themselves), then the OSD starts. Some of the OSDs do get those links though, and that's interesting because on these hosts, the ceph wal and db for all the OSDs are LVs on the same nvme device, in fact the same partition even. The ceph OSD block dev is an LV on a different device. ** Changed in: systemd (Ubuntu) Status: Incomplete => New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Andrey, I don't know if you saw James' comment as yours may have coincided but if you can get the ceph-osd package version that would be helpful. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Yes, it is latest - the cluster is being re-deployed as part of Bootstack handover. Corey, The bug you point to is fixing the sequence of ceph/udev. Here however udev can't create any devices as they don't exist at the moment of udev run seems so - when the host boots and settles down - there is no PVs exist at all. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Please can you confirm which version of the ceph-osd package you have installed; older versions rely on a charm shipped udev ruleset, rather than it being provided by the packaging. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This feels similar to https://bugs.launchpad.net/charm-ceph- osd/+bug/1812925. First question, are you running with the latest stable charms which have the fix for that bug? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
The ceph-osd package provide udev rules which should switch the owner for all ceph related LVM VG's to ceph:ceph. # OSD LVM layout example # VG prefix: ceph- # LV prefix: osd- ACTION=="add", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="disk", \ ENV{DM_LV_NAME}=="osd-*", \ ENV{DM_VG_NAME}=="ceph-*", \ OWNER:="ceph", GROUP:="ceph", MODE:="660" ACTION=="change", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="disk", \ ENV{DM_LV_NAME}=="osd-*", \ ENV{DM_VG_NAME}=="ceph-*", \ OWNER="ceph", GROUP="ceph", MODE="660" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
by-dname udev rules are created by MAAS/curtin as part of the server install I think. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Steve, This is MAAS who creates these udev rules. We requested this feature to be implemented in order to be able to use persistent names in further services configuration (using templating). We couldn't go with /dev/sdX names as they may change after the reboot, and can't use wwn names as they are unique per node and don't allow us to use templates with FCB. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
> LVM module is supposed to create PVs from devices using the links in > /dev/disk/by-dname/ > folder that are created by udev. Created by udev how? disk/by-dname is not part of the hierarchy that is populated by the standard udev rules, nor is this created by lvm2. Is there something in the ceph-osd packaging specifically which generates these links - and, in turn, depends on them for assembling LVs? Can you provide udev logs (journalctl --no-pager -lu systemd- udevd.service; udevadm info -e) from the system following a boot when this race is hit? ** Changed in: systemd (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Just one update, if I change the perms of the symlink made (chown -h) the OSD will actually start. After rebooting, however, I found that the links I had made had gone again and the whole process needed repeating in order to start the OSD. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Added field-critical, there's a cloud deploy ongoing where I currently can't reboot any hosts, nor get some of the OSDs back from a host I rebooted, until we have a workaround. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I'm seeing this in a slightly different manner, on Bionic/Queens. We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently. The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of links to block.db and block.wal on a non-working OSD. See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info. If I made the links manually: cd /var/lib/ceph/osd/ceph-4 ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal This resulted in a perms error accessing the device "bluestore(/var/lib/ceph/osd/ceph-4) _open_db /var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable: (13) Permission denied" ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/ total 0 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15 I tried to change the perms to ceph.ceph ownership, but no change. I have also tried (using `systemctl edit lvm2-monitor.service`) adding the following to lvm2, but that's not changed the behavior either: # cat /etc/systemd/system/lvm2-monitor.service.d/override.conf [Service] ExecStartPre=/bin/sleep 60 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Tags added: canonical-bootstack -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
This manifests itself as the following, as reported by lsblk(1). Note the missing Ceph LVM volume on the 6th NVME disk: $ cat sos_commands/block/lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 1.8T 0 disk |-sda1 8:10 512M 0 part /boot/efi `-sda2 8:20 1.8T 0 part |-foobar--vg-root 253:00 1.8T 0 lvm / `-foobar--vg-swap_1 253:10 976M 0 lvm [SWAP] nvme0n1 259:00 1.8T 0 disk `-ceph--c576f63e--dfd4--48f7--9d60--6a7708cbccf6-osd--block--9fdd78b2--0745--47ae--b8d4--04d9803ab448 253:60 1.8T 0 lvm nvme1n1 259:10 1.8T 0 disk `-ceph--6eb6565f--6392--44a8--9213--833b09f7c0bc-osd--block--a7d3629c--724f--4218--9d15--593ec64781da 253:50 1.8T 0 lvm nvme2n1 259:20 1.8T 0 disk `-ceph--c14f9ee5--90d0--4306--9b18--99576516f76a-osd--block--bbf5bc79--edea--4e43--8414--b5140b409397 253:40 1.8T 0 lvm nvme3n1 259:30 1.8T 0 disk `-ceph--a821146b--7674--4bcc--b5e9--0126c4bd5e3b-osd--block--b9371499--ff99--4d3e--ab3f--62ec3cf918c4 253:30 1.8T 0 lvm nvme4n1 259:40 1.8T 0 disk `-ceph--2e39f75a--5d2a--49ee--beb1--5d0a2991fd6c-osd--block--a1be083e--1fa7--4397--acfa--2ff3d3491572 253:20 1.8T 0 lvm nvme5n1 259:50 1.8T 0 disk -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: systemd (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs