After looking at the journal I can conclude a few things.

1) the bcache devices are found and rules run and change events emitted
fairly early during boot

[    8.873909] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 
673 keys in 35 entries, seq 724
[    8.875099] ubuntu kernel: i40e 0000:5e:00.3 eth3: NIC Link is Up, 10 Gbps 
Full Duplex, Flow Control: RX/TX
[    8.894272] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi3 as 
bcache8 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515
[    8.908230] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi4 as 
bcache7 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515
[    8.919554] ubuntu kernel: bcache: register_cache() registered cache device 
nvme0n1p1
[    8.976751] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 
0 keys in 2 entries, seq 41

2) After Cold Plug (which happens very early after rootfs is mounted and
the systemd-udevd.service is started; we see a repeat of the
registration (which will re-emit the change event which exports the
CACHED_UUID/CACHED_LABEL values from the kernel to userspace for the
bcache rules to run and generate symlinks in /dev/bcache/by-uuid (and
dname rules, which also depend on the same values in the ENV when the
rules run).

[   13.320515] ubuntu systemd[1]: Started udev Coldplug all Devices.
...
[   14.405166] ubuntu kernel: bcache: register_bcache() error /dev/sdc: device 
already registered (emitting change event)
[   14.405280] ubuntu kernel: bcache: register_bcache() error /dev/sdg: device 
already registered (emitting change event)
(skipping the additional ones)

3) Quite some time after the system is up and running, we see repeat
registrations which are the result of something invoking udevadm
trigger.

[  837.100772] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: 
device already registered (emitting change event)
[  837.117295] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: 
device already registered (emitting change event)
[  837.120357] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: 
device already registered (emitting change event)
[  837.122496] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: 
device already registered (emitting change event)
[  837.122765] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: 
device already registered (emitting change event)
[  837.132009] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: 
device already registered (emitting change event)
[  837.160226] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: 
device already registered (emitting change event)
[  837.177862] ln-sv-ostk01 kernel: bcache: register_bcache() error 
/dev/nvme0n1p3: device already registered
[  837.178724] ln-sv-ostk01 kernel: bcache: register_bcache() error 
/dev/nvme0n1p1: device already registered
[  837.195827] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: 
device already registered (emitting change event)
[  837.196084] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: 
device already registered (emitting change event)
[  837.196107] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sda1: 
device already registered (emitting change event)
[  837.196351] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: 
device already registered (emitting change event)
[  837.196601] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: 
device already registered (emitting change event)
[  837.198407] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: 
device already registered (emitting change event)
[  837.199699] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: 
device already registered (emitting change event)
[  837.199733] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: 
device already registered (emitting change event)
[  837.207026] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: 
device already registered (emitting change event)
[  837.208943] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: 
device already registered (emitting change event)
[  837.209058] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: 
device already registered (emitting change event)
[  837.209697] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: 
device already registered (emitting change event)
[  837.209788] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: 
device already registered (emitting change event)
[  837.212004] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: 
device already registered (emitting change event)
[  837.213403] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: 
device already registered (emitting change event)
[  841.815986] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdi4: 
device already registered (emitting change event)
[  841.816737] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdi3: 
device already registered (emitting change event)


I'm not 100% sure at which point ceph/charms make use of the dname symlinks, 
but ceph appears to be up much earlier than this.

$ journalctl -o short-monotonic -D fd6c366f929d47a195590b3c6dc9df5a -u ceph*
-- Logs begin at Tue 2019-01-29 14:20:56 UTC, end at Tue 2019-01-29 15:23:10 
UTC. --
[  290.386737] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to 
start/stop all ceph*@.service instances at once.
[  307.411830] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to 
start/stop all [email protected] instances at once.
[  308.321612] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to 
start/stop all [email protected] instances at once.
[  309.064386] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to 
start/stop all [email protected] instances at once.
[  309.911307] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to 
start/stop all [email protected] instances at once.

4.  Right before each of the udevadm triggers, there is a snap being
installed

[  811.249574] ln-sv-ostk01 systemd[1]: Reloading.
[  830.771380] ln-sv-ostk01 snapd[2172]: 2019/01/29 14:34:34.524657 
api.go:1046: Installing snap "canonical-livepatch" revision unset
[  835.851602] ln-sv-ostk01 systemd[1]: Reloading.
[  836.044905] ln-sv-ostk01 systemd[1]: Mounting Mount unit for core, revision 
6259...
[  836.118601] ln-sv-ostk01 systemd[1]: Mounted Mount unit for core, revision 
6259.
[  836.149989] ln-sv-ostk01 snapd[2172]: 2019/01/29 14:34:39.903269 
backend.go:303: cannot create host snap-confine apparmor configuration: cannot 
synchronize snap-confine a
[  836.172115] ln-sv-ostk01 audit[120593]: AVC apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="snap-update-ns.core" 
pid=120593 comm="apparmor_parser"
[  836.296052] ln-sv-ostk01 kernel: audit: type=1400 audit(1548772479.923:175): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="snap-update-ns.core" pi
[  836.230674] ln-sv-ostk01 audit[120598]: AVC apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="snap.core.hook.configure" 
pid=120598 comm="apparmor_pars
[  836.354604] ln-sv-ostk01 kernel: audit: type=1400 audit(1548772479.979:176): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="snap.core.hook.configur
[  837.100772] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: 
device already registered (emitting change event)


I've confirmed with snapd folks, that snapd's udev interface will run:  udevadm 
trigger, which will emit a CHANGE event for all subsystems, including block.  
And snapd will run udevamd settle --timeout=10  (which waits up to 10 seconds 
before exiting).

This is why we see additional registrations of bcache devices at
runtime.

>From our udevd debugging experiment, we do know that while udevd is
processing the block subsystem (at least for ADD events) the bcache
symlinks get removed, until the backing and cache device get bound
together in the kernel (via a bcache-register) and then the kernel can
emit a CHANGE event with the CACHED_UUID value.


For next steps; I believe we can recreate the scenario by constructing a system 
with the same bcache structure and rules in place, and then issue udevadm 
trigger and see how many we need to get into a state where the 
/dev/bcache/by-uuid symlinks are removed and not present.

>From there we can see what can be done to resolve missing symlinks if
these triggers happen repeatedly (which is likely given snapd behavior).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1812925

Title:
  No OSDs has been initialized in random unit with "No block devices
  detected using current configuration"

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/1812925/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to