** Changed in: cloud-archive
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3~cloud0
---
ceph (12.2.12-0ubuntu0.18.04.3~cloud0) xenial-queens; urgency=medium
.
* New update for the Ubuntu Cloud Archive.
.
ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium
.
[ James Page ]
*
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.18.10.3~cloud0
---
ceph (13.2.6-0ubuntu0.18.10.3~cloud0) bionic; urgency=medium
.
[ Eric Desrochers ]
* Ensure that daemons are not automatically restarted during package
upgrades (LP: #1840347):
- d/rules: Use
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4~cloud0
---
ceph (13.2.6-0ubuntu0.19.04.4~cloud0) bionic-stein; urgency=medium
.
* New update for the Ubuntu Cloud Archive.
.
ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium
.
[ Eric Desrochers ]
*
** Changed in: cloud-archive/train
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.3
---
ceph (12.2.12-0ubuntu0.18.04.3) bionic; urgency=medium
[ James Page ]
* d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to
ensure that required wal and db devices are present before
This bug was fixed in the package ceph - 13.2.6-0ubuntu0.19.04.4
---
ceph (13.2.6-0ubuntu0.19.04.4) disco; urgency=medium
[ Eric Desrochers ]
* Ensure that daemons are not automatically restarted during package
upgrades (LP: #1840347):
- d/rules: Use
Verification completed for bionic-rocky-proposed
$ apt-cache policy ceph-osd
ceph-osd:
Installed: 13.2.6-0ubuntu0.18.10.3~cloud0
Candidate: 13.2.6-0ubuntu0.18.10.3~cloud0
Version table:
*** 13.2.6-0ubuntu0.18.10.3~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu
Verification completed on bionic-stein-proposed:
$ apt-cache policy ceph-osd
ceph-osd:
Installed: 13.2.6-0ubuntu0.19.04.4~cloud0
Candidate: 13.2.6-0ubuntu0.19.04.4~cloud0
Version table:
*** 13.2.6-0ubuntu0.19.04.4~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu
$ apt-cache policy ceph-osd
ceph-osd:
Installed: 13.2.6-0ubuntu0.19.04.4
Candidate: 13.2.6-0ubuntu0.19.04.4
Version table:
*** 13.2.6-0ubuntu0.19.04.4 500
500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages
100 /var/lib/dpkg/status
bionic-proposed tested with a deployment using separate db and wal
devices; OSD's restarted reliably over 10 x reboot iterations across
three machines.
** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic
--
You received this bug notification because you are a
Hello Andrey, or anyone else affected,
Accepted ceph into disco-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/ceph/13.2.6-0ubuntu0.19.04.4 in a
few hours, and then in the -proposed repository.
Please help us by testing this new package. See
** No longer affects: cloud-archive/pike
** Changed in: cloud-archive/train
Status: New => In Progress
** Changed in: cloud-archive/stein
Status: New => In Progress
** Changed in: cloud-archive/rocky
Status: New => In Progress
** Changed in: cloud-archive/queens
This bug was fixed in the package ceph - 14.2.2-0ubuntu2
---
ceph (14.2.2-0ubuntu2) eoan; urgency=medium
[ Eric Desrochers ]
* Ensure that daemons are not automatically restarted during package
upgrades (LP: #1840347):
- d/rules: Use "--no-restart-after-upgrade" and
** Changed in: ceph (Ubuntu Bionic)
Status: New => In Progress
** Changed in: ceph (Ubuntu Disco)
Status: New => In Progress
** Changed in: ceph (Ubuntu Disco)
Assignee: (unassigned) => James Page (james-page)
** Changed in: ceph (Ubuntu Bionic)
Assignee: (unassigned) =>
** Merge proposal linked:
https://code.launchpad.net/~slashd/ubuntu/+source/ceph/+git/ceph/+merge/371549
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing'
** Also affects: ceph (Ubuntu Eoan)
Importance: High
Assignee: James Page (james-page)
Status: In Progress
** Also affects: ceph (Ubuntu Disco)
Importance: Undecided
Status: New
** Also affects: ceph (Ubuntu Bionic)
Importance: Undecided
Status: New
--
You
Alternative fix proposed upstream - picking this in preference to
Corey's fix as its in the right part of the codebase for ceph-volume.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Alternative fix: https://github.com/ceph/ceph/pull/28791
** Changed in: ceph (Ubuntu)
Assignee: Corey Bryant (corey.bryant) => James Page (james-page)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Building in ppa:ci-train-ppa-service/3535 (will take a few hours).
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
@David, thanks for the update. We could really use some testing of the
current proposed fix if you have a chance. That's in a PPA mentioned
above. The new code will wait for wal/db devices to arrive and has env
vars to adjust wait times - http://docs.ceph.com/docs/mimic/ceph-
Just adding that I've worked around this issue with the following added
to the lvm2-monitor overrides
(/etc/systemd/system/lvm2-monitor.service.d/custom.conf):
[Service]
ExecStartPre=/bin/sleep 60
This results in 100% success for every single boot, with no missed disks
nor missed LVM volumes
Py2 bug found in code review upstream. Updated PPA again with fix.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
** Changed in: ceph (Ubuntu)
Importance: Critical => High
** Changed in: ceph (Ubuntu)
Status: Triaged => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts
Note that the code looks for wal/db devices in the block device's LV
tags after it is found. In other words:
sudo lvs -o lv_tags | grep type=block | grep ceph.wal_device
sudo lvs -o lv_tags | grep type=block | grep ceph.db_device
This is the window where the following might not yet exist, yet we
I chatted with xav in IRC and he showed me a private link to the log
files. The ceph-volume-systemd.log.1 had timestamps of 2019-06-03 which
matches up with the last attempt (see comment #37).
I didn't find any logs from the new code in this log file. That likely
means one of the following: there
Any chance the log files got rotated and zipped? What does an ls of
/var/log/ceph show?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd
The pvscan issue is likely something different, just wanted to make sure
folks are aware of it for completeness.
The logs /var/log/ceph/ceph-volume-systemd.log and ceph-volume.log are
empty.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to
Do you have access to the /var/log/ceph/ceph-volume-systemd.log after
the latest reboot? That should give us some details such as:
"[2019-05-31 20:43:44,334][systemd][WARNING] failed to find db volume,
retries left: 17"
or similar for wal volume.
If you see that the retries have been exceeded
Let me word that last comment differently.
I went to the host and installed the PPA update, then rebooted.
When the box booted up, the PV which hosts the wal LVs wasn't listed in
lsblk or 'pvs' or lvs. I then ran pvscan --cache, which brought the LVs
back online, but not the OSDs, so I
After installing that PPA update and rebooting, the PV for the wal
didn't come online till I ran pvscan --cache. Seems a second reboot
didn't do that though, might have been a red herring from prior
attempts.
Unfortunately, the OSDs didn't seem to come online in exactly the same
way after
I've cherry-picked that patch to the package in the PPA if anyone can
test. I'm fairly sure this will fix it as I've been testing and
removing/adding the volume backed storage in my testing environment and
it will wait for the wal/db devices for a while if they don't exist.
** Changed in: ceph
Upstream pull request: https://github.com/ceph/ceph/pull/28357
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
To
Upstream ceph bug opened: https://tracker.ceph.com/issues/40100
** Bug watch added: tracker.ceph.com/issues #40100
http://tracker.ceph.com/issues/40100
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
The 'ceph-volume lvm trigger' call appears to come from ceph source at
src/ceph-volume/ceph_volume/systemd/main.py.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly
Thanks for testing. That should rule out udev as the cause of the race.
A couple of observations from the log:
* There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times
until the OSD is activated, for example for 4:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO ]
Hi,
Added the udevadm settle --timeout=5 in both the 2 remaining if block's
in the referenced script. That did not make a difference.
See https://pastebin.ubuntu.com/p/8f2ZXMRNgv/ for the ceph-volume-
systemd.log
At this boot, the osd's with numbers 4, 11 & 18 did not start, with the
missing
Note that there may only be a short window during system startup to
catch missing tags with 'sudo lvs -o lv_tags'.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly
Some further references:
Each part of the OSD is queried for its underlying block device using
blkid:
https://github.com/ceph/ceph/blob/luminous/src/ceph-
volume/ceph_volume/devices/lvm/activate.py#L114
I guess that if the block device was not visible/present at the point
that code runs
Referenced from:
https://github.com/ceph/ceph/blob/luminous/src/ceph-
volume/ceph_volume/devices/lvm/activate.py#L154
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly
The ceph-volume tool assembles and primes the OSD directory using the LV
tags written during the prepare action - it would be good to validate
these are OK with 'sudo lvs -o lv_tags'
The tags will contain UUID information about all of the block devices
associated with an OSD.
--
You received
Any output in /var/log/ceph/ceph-volume-systemd.log would also be useful
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service
@Wouter, Thanks for testing. I'm rebuilding the package without the
checks as they're probably preventing the udevadm settle from running.
In the new build the 'udevadm settle --timeout=5' will run regardless.
Let's see if that helps and then we can fine tune the checks surrounding
the call later.
@Wouter, since ceph takes so long to build you could also manually add
'udevadm settle --timeout=5' to /usr/lib/ceph/ceph-osd-prestart.sh
across the ceph-osd units to test that.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Hi,
Installed the packages from the above ppa, rebooted the host and 4 out
of 7 osd's came up. The 3 that where missing from the `ceph osd tree`
where not running the osd daemon as they lacked the symlinks to the db
and the wal.
Rebooted the server, and after the reboot other osd's (again 3 out
Thanks, will do. FWIW, the symlinks are in place before reboot.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
I'm building a test package for ceph with additional logic added to
/usr/lib/ceph/ceph-osd-prestart.sh to allow block.wal and block.db
additional time to settle. This is just a version to test the fix. I'm
not sure if the behavior is the same as journal file (symlink exists but
file doesn't) but
I didn't recreate this but I did get a deployment on serverstack with
bluestore WAL and DB devices. That's done with:
1) juju deploy --series bionic --num-units 1 --constraints mem=2G
--config expected-osd-count=1 --config monitor-count=1 cs:ceph-mon ceph-
mon
2) juju deploy --series bionic
Couple typos in comment #19:
I think bluestore-wal and bluestore-db needed 2G.
Also s/exists/exits
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks,
** Package changed: systemd (Ubuntu) => ceph (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
To manage
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: ceph (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Thanks for all the details.
I need to confirm this but I think the block.db and block.wal symlinks
are created as a result of 'ceph-volume lvm prepare --bluestore --data
--block.wal --block.db '.
That's coded in the ceph-osd charm around here:
https://opendev.org/openstack/charm-ceph-
journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out
Hostname obfusticated
lsblk:
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0
udevadm info -e >/tmp/1828617-2.out
~# ls -l /var/lib/ceph/osd/ceph*
-rw--- 1 ceph ceph 69 May 21 08:44
/var/lib/ceph/osd/ceph.client.osd-upgrade.keyring
/var/lib/ceph/osd/ceph-11:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block ->
Charm is cs:ceph-osd-284
Ceph version is 12.2.11-0ubuntu0.18.04.2
The udev rules are created by curtin during the maas install.
Here's an example udev rule:
cat bcache4.rules
# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change",
Andrey, I don't know if you saw James' comment as yours may have
coincided but if you can get the ceph-osd package version that would be
helpful. Thanks!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Yes, it is latest - the cluster is being re-deployed as part of
Bootstack handover.
Corey,
The bug you point to is fixing the sequence of ceph/udev. Here however udev
can't create any devices as they don't exist at the moment of udev run seems so
- when the host boots and settles down - there
Please can you confirm which version of the ceph-osd package you have
installed; older versions rely on a charm shipped udev ruleset, rather
than it being provided by the packaging.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
This feels similar to https://bugs.launchpad.net/charm-ceph-
osd/+bug/1812925. First question, are you running with the latest stable
charms which have the fix for that bug?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
The ceph-osd package provide udev rules which should switch the owner
for all ceph related LVM VG's to ceph:ceph.
# OSD LVM layout example
# VG prefix: ceph-
# LV prefix: osd-
ACTION=="add", SUBSYSTEM=="block", \
ENV{DEVTYPE}=="disk", \
ENV{DM_LV_NAME}=="osd-*", \
ENV{DM_VG_NAME}=="ceph-*",
by-dname udev rules are created by MAAS/curtin as part of the server
install I think.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd
Steve,
This is MAAS who creates these udev rules. We requested this feature to be
implemented in order to be able to use persistent names in further services
configuration (using templating). We couldn't go with /dev/sdX names as they
may change after the reboot, and can't use wwn names as they
> LVM module is supposed to create PVs from devices using the links in
> /dev/disk/by-dname/
> folder that are created by udev.
Created by udev how? disk/by-dname is not part of the hierarchy that is
populated by the standard udev rules, nor is this created by lvm2. Is
there something in the
Just one update, if I change the perms of the symlink made (chown -h)
the OSD will actually start.
After rebooting, however, I found that the links I had made had gone
again and the whole process needed repeating in order to start the OSD.
--
You received this bug notification because you are a
Added field-critical, there's a cloud deploy ongoing where I currently
can't reboot any hosts, nor get some of the OSDs back from a host I
rebooted, until we have a workaround.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
I'm seeing this in a slightly different manner, on Bionic/Queens.
We have LVMs encrypted (thanks Vault), and rebooting a host results in
at least one OSD not returning fairly consistently. The LVs appear in
the list, however the difference between a working and a non-working OSD
is the lack of
** Tags added: canonical-bootstack
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
Hosts randomly 'losing' disks, breaking ceph-osd service enumeration
To manage notifications about
This manifests itself as the following, as reported by lsblk(1). Note
the missing Ceph LVM volume on the 6th NVME disk:
$ cat sos_commands/block/lsblk
NAME
MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: systemd (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828617
Title:
69 matches
Mail list logo