[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-12-11 Thread James Page
** Changed in: cloud-archive Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-10-30 Thread James Page
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3~cloud0 --- ceph (12.2.12-0ubuntu0.18.04.3~cloud0) xenial-queens; urgency=medium . * New update for the Ubuntu Cloud Archive. . ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium . [ James Page ] *

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-10-14 Thread James Page
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.18.10.3~cloud0 --- ceph (13.2.6-0ubuntu0.18.10.3~cloud0) bionic; urgency=medium . [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-10-14 Thread James Page
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4~cloud0 --- ceph (13.2.6-0ubuntu0.19.04.4~cloud0) bionic-stein; urgency=medium . * New update for the Ubuntu Cloud Archive. . ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium . [ Eric Desrochers ] *

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-10-10 Thread James Page
** Changed in: cloud-archive/train Status: In Progress => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-30 Thread Launchpad Bug Tracker
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3 --- ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium [ James Page ] * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to ensure that required wal and db devices are present before

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-23 Thread Launchpad Bug Tracker
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4 --- ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-20 Thread James Page
Verification completed for bionic-rocky-proposed $ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.18.10.3~cloud0 Candidate: 13.2.6-0ubuntu0.18.10.3~cloud0 Version table: *** 13.2.6-0ubuntu0.18.10.3~cloud0 500 500 http://ubuntu-cloud.archive.canonical.com/ubuntu

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-20 Thread James Page
Verification completed on bionic-stein-proposed: $ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.19.04.4~cloud0 Candidate: 13.2.6-0ubuntu0.19.04.4~cloud0 Version table: *** 13.2.6-0ubuntu0.19.04.4~cloud0 500 500 http://ubuntu-cloud.archive.canonical.com/ubuntu

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-20 Thread James Page
$ apt-cache policy ceph-osd ceph-osd: Installed: 13.2.6-0ubuntu0.19.04.4 Candidate: 13.2.6-0ubuntu0.19.04.4 Version table: *** 13.2.6-0ubuntu0.19.04.4 500 500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages 100 /var/lib/dpkg/status

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-19 Thread James Page
bionic-proposed tested with a deployment using separate db and wal devices; OSD's restarted reliably over 10 x reboot iterations across three machines. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-09-05 Thread Ɓukasz Zemczak
Hello Andrey, or anyone else affected, Accepted ceph into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/13.2.6-0ubuntu0.19.04.4 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-08-30 Thread James Page
** No longer affects: cloud-archive/pike ** Changed in: cloud-archive/train Status: New => In Progress ** Changed in: cloud-archive/stein Status: New => In Progress ** Changed in: cloud-archive/rocky Status: New => In Progress ** Changed in: cloud-archive/queens

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-08-29 Thread Launchpad Bug Tracker
This bug was fixed in the package ceph - 14.2.2-0ubuntu2 --- ceph (14.2.2-0ubuntu2) eoan; urgency=medium [ Eric Desrochers ] * Ensure that daemons are not automatically restarted during package upgrades (LP: #1840347): - d/rules: Use "--no-restart-after-upgrade" and

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-08-29 Thread James Page
** Changed in: ceph (Ubuntu Bionic) Status: New => In Progress ** Changed in: ceph (Ubuntu Disco) Status: New => In Progress ** Changed in: ceph (Ubuntu Disco) Assignee: (unassigned) => James Page (james-page) ** Changed in: ceph (Ubuntu Bionic) Assignee: (unassigned) =>

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-08-20 Thread Launchpad Bug Tracker
** Merge proposal linked: https://code.launchpad.net/~slashd/ubuntu/+source/ceph/+git/ceph/+merge/371549 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing'

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-07-04 Thread James Page
** Also affects: ceph (Ubuntu Eoan) Importance: High Assignee: James Page (james-page) Status: In Progress ** Also affects: ceph (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: ceph (Ubuntu Bionic) Importance: Undecided Status: New -- You

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-07-03 Thread James Page
Alternative fix proposed upstream - picking this in preference to Corey's fix as its in the right part of the codebase for ceph-volume. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title:

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-07-03 Thread James Page
Alternative fix: https://github.com/ceph/ceph/pull/28791 ** Changed in: ceph (Ubuntu) Assignee: Corey Bryant (corey.bryant) => James Page (james-page) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-07-03 Thread James Page
Building in ppa:ci-train-ppa-service/3535 (will take a few hours). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-17 Thread Corey Bryant
@David, thanks for the update. We could really use some testing of the current proposed fix if you have a chance. That's in a PPA mentioned above. The new code will wait for wal/db devices to arrive and has env vars to adjust wait times - http://docs.ceph.com/docs/mimic/ceph-

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-13 Thread David A. Desrosiers
Just adding that I've worked around this issue with the following added to the lvm2-monitor overrides (/etc/systemd/system/lvm2-monitor.service.d/custom.conf): [Service] ExecStartPre=/bin/sleep 60 This results in 100% success for every single boot, with no missed disks nor missed LVM volumes

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-12 Thread Corey Bryant
Py2 bug found in code review upstream. Updated PPA again with fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-11 Thread James Page
** Changed in: ceph (Ubuntu) Importance: Critical => High ** Changed in: ceph (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-07 Thread Corey Bryant
Note that the code looks for wal/db devices in the block device's LV tags after it is found. In other words: sudo lvs -o lv_tags | grep type=block | grep ceph.wal_device sudo lvs -o lv_tags | grep type=block | grep ceph.db_device This is the window where the following might not yet exist, yet we

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-07 Thread Corey Bryant
I chatted with xav in IRC and he showed me a private link to the log files. The ceph-volume-systemd.log.1 had timestamps of 2019-06-03 which matches up with the last attempt (see comment #37). I didn't find any logs from the new code in this log file. That likely means one of the following: there

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-05 Thread Corey Bryant
Any chance the log files got rotated and zipped? What does an ls of /var/log/ceph show? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-04 Thread Xav Paice
The pvscan issue is likely something different, just wanted to make sure folks are aware of it for completeness. The logs /var/log/ceph/ceph-volume-systemd.log and ceph-volume.log are empty. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-04 Thread Corey Bryant
Do you have access to the /var/log/ceph/ceph-volume-systemd.log after the latest reboot? That should give us some details such as: "[2019-05-31 20:43:44,334][systemd][WARNING] failed to find db volume, retries left: 17" or similar for wal volume. If you see that the retries have been exceeded

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-04 Thread Xav Paice
Let me word that last comment differently. I went to the host and installed the PPA update, then rebooted. When the box booted up, the PV which hosts the wal LVs wasn't listed in lsblk or 'pvs' or lvs. I then ran pvscan --cache, which brought the LVs back online, but not the OSDs, so I

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-06-03 Thread Xav Paice
After installing that PPA update and rebooting, the PV for the wal didn't come online till I ran pvscan --cache. Seems a second reboot didn't do that though, might have been a red herring from prior attempts. Unfortunately, the OSDs didn't seem to come online in exactly the same way after

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-31 Thread Corey Bryant
I've cherry-picked that patch to the package in the PPA if anyone can test. I'm fairly sure this will fix it as I've been testing and removing/adding the volume backed storage in my testing environment and it will wait for the wal/db devices for a while if they don't exist. ** Changed in: ceph

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-31 Thread Corey Bryant
Upstream pull request: https://github.com/ceph/ceph/pull/28357 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-31 Thread Corey Bryant
Upstream ceph bug opened: https://tracker.ceph.com/issues/40100 ** Bug watch added: tracker.ceph.com/issues #40100 http://tracker.ceph.com/issues/40100 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-31 Thread Corey Bryant
The 'ceph-volume lvm trigger' call appears to come from ceph source at src/ceph-volume/ceph_volume/systemd/main.py. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-31 Thread Corey Bryant
Thanks for testing. That should rule out udev as the cause of the race. A couple of observations from the log: * There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example for 4: [2019-05-31 01:27:29,235][ceph_volume.process][INFO ]

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread Wouter van Bommel
Hi, Added the udevadm settle --timeout=5 in both the 2 remaining if block's in the referenced script. That did not make a difference. See https://pastebin.ubuntu.com/p/8f2ZXMRNgv/ for the ceph-volume- systemd.log At this boot, the osd's with numbers 4, 11 & 18 did not start, with the missing

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread Corey Bryant
Note that there may only be a short window during system startup to catch missing tags with 'sudo lvs -o lv_tags'. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread James Page
Some further references: Each part of the OSD is queried for its underlying block device using blkid: https://github.com/ceph/ceph/blob/luminous/src/ceph- volume/ceph_volume/devices/lvm/activate.py#L114 I guess that if the block device was not visible/present at the point that code runs

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread James Page
Referenced from: https://github.com/ceph/ceph/blob/luminous/src/ceph- volume/ceph_volume/devices/lvm/activate.py#L154 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread James Page
The ceph-volume tool assembles and primes the OSD directory using the LV tags written during the prepare action - it would be good to validate these are OK with 'sudo lvs -o lv_tags' The tags will contain UUID information about all of the block devices associated with an OSD. -- You received

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread James Page
Any output in /var/log/ceph/ceph-volume-systemd.log would also be useful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread Corey Bryant
@Wouter, Thanks for testing. I'm rebuilding the package without the checks as they're probably preventing the udevadm settle from running. In the new build the 'udevadm settle --timeout=5' will run regardless. Let's see if that helps and then we can fine tune the checks surrounding the call later.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-30 Thread Corey Bryant
@Wouter, since ceph takes so long to build you could also manually add 'udevadm settle --timeout=5' to /usr/lib/ceph/ceph-osd-prestart.sh across the ceph-osd units to test that. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Wouter van Bommel
Hi, Installed the packages from the above ppa, rebooted the host and 4 out of 7 osd's came up. The 3 that where missing from the `ceph osd tree` where not running the osd daemon as they lacked the symlinks to the db and the wal. Rebooted the server, and after the reboot other osd's (again 3 out

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Xav Paice
Thanks, will do. FWIW, the symlinks are in place before reboot. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Corey Bryant
I'm building a test package for ceph with additional logic added to /usr/lib/ceph/ceph-osd-prestart.sh to allow block.wal and block.db additional time to settle. This is just a version to test the fix. I'm not sure if the behavior is the same as journal file (symlink exists but file doesn't) but

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Corey Bryant
I didn't recreate this but I did get a deployment on serverstack with bluestore WAL and DB devices. That's done with: 1) juju deploy --series bionic --num-units 1 --constraints mem=2G --config expected-osd-count=1 --config monitor-count=1 cs:ceph-mon ceph- mon 2) juju deploy --series bionic

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Corey Bryant
Couple typos in comment #19: I think bluestore-wal and bluestore-db needed 2G. Also s/exists/exits -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks,

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Dimitri John Ledkov
** Package changed: systemd (Ubuntu) => ceph (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: ceph (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title:

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-29 Thread Corey Bryant
Thanks for all the details. I need to confirm this but I think the block.db and block.wal symlinks are created as a result of 'ceph-volume lvm prepare --bluestore --data --block.wal --block.db '. That's coded in the ceph-osd charm around here: https://opendev.org/openstack/charm-ceph-

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Xav Paice
journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out Hostname obfusticated lsblk: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Xav Paice
udevadm info -e >/tmp/1828617-2.out ~# ls -l /var/lib/ceph/osd/ceph* -rw--- 1 ceph ceph 69 May 21 08:44 /var/lib/ceph/osd/ceph.client.osd-upgrade.keyring /var/lib/ceph/osd/ceph-11: total 24 lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block ->

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Xav Paice
Charm is cs:ceph-osd-284 Ceph version is 12.2.11-0ubuntu0.18.04.2 The udev rules are created by curtin during the maas install. Here's an example udev rule: cat bcache4.rules # Written by curtin SUBSYSTEM=="block", ACTION=="add|change",

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Corey Bryant
Andrey, I don't know if you saw James' comment as yours may have coincided but if you can get the ceph-osd package version that would be helpful. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Andrey Grebennikov
Yes, it is latest - the cluster is being re-deployed as part of Bootstack handover. Corey, The bug you point to is fixing the sequence of ceph/udev. Here however udev can't create any devices as they don't exist at the moment of udev run seems so - when the host boots and settles down - there

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread James Page
Please can you confirm which version of the ceph-osd package you have installed; older versions rely on a charm shipped udev ruleset, rather than it being provided by the packaging. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Corey Bryant
This feels similar to https://bugs.launchpad.net/charm-ceph- osd/+bug/1812925. First question, are you running with the latest stable charms which have the fix for that bug? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread James Page
The ceph-osd package provide udev rules which should switch the owner for all ceph related LVM VG's to ceph:ceph. # OSD LVM layout example # VG prefix: ceph- # LV prefix: osd- ACTION=="add", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="disk", \ ENV{DM_LV_NAME}=="osd-*", \ ENV{DM_VG_NAME}=="ceph-*",

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread James Page
by-dname udev rules are created by MAAS/curtin as part of the server install I think. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Andrey Grebennikov
Steve, This is MAAS who creates these udev rules. We requested this feature to be implemented in order to be able to use persistent names in further services configuration (using templating). We couldn't go with /dev/sdX names as they may change after the reboot, and can't use wwn names as they

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-28 Thread Steve Langasek
> LVM module is supposed to create PVs from devices using the links in > /dev/disk/by-dname/ > folder that are created by udev. Created by udev how? disk/by-dname is not part of the hierarchy that is populated by the standard udev rules, nor is this created by lvm2. Is there something in the

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-22 Thread Xav Paice
Just one update, if I change the perms of the symlink made (chown -h) the OSD will actually start. After rebooting, however, I found that the links I had made had gone again and the whole process needed repeating in order to start the OSD. -- You received this bug notification because you are a

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-22 Thread Xav Paice
Added field-critical, there's a cloud deploy ongoing where I currently can't reboot any hosts, nor get some of the OSDs back from a host I rebooted, until we have a workaround. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-22 Thread Xav Paice
I'm seeing this in a slightly different manner, on Bionic/Queens. We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently. The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-22 Thread Xav Paice
** Tags added: canonical-bootstack -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration To manage notifications about

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-13 Thread David A. Desrosiers
This manifests itself as the following, as reported by lsblk(1). Note the missing Ceph LVM volume on the 6th NVME disk: $ cat sos_commands/block/lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda

[Bug 1828617] Re: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

2019-05-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: systemd (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: