After looking at the journal I can conclude a few things. 1) the bcache devices are found and rules run and change events emitted fairly early during boot
[ 8.873909] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 673 keys in 35 entries, seq 724 [ 8.875099] ubuntu kernel: i40e 0000:5e:00.3 eth3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: RX/TX [ 8.894272] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi3 as bcache8 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515 [ 8.908230] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi4 as bcache7 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515 [ 8.919554] ubuntu kernel: bcache: register_cache() registered cache device nvme0n1p1 [ 8.976751] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 0 keys in 2 entries, seq 41 2) After Cold Plug (which happens very early after rootfs is mounted and the systemd-udevd.service is started; we see a repeat of the registration (which will re-emit the change event which exports the CACHED_UUID/CACHED_LABEL values from the kernel to userspace for the bcache rules to run and generate symlinks in /dev/bcache/by-uuid (and dname rules, which also depend on the same values in the ENV when the rules run). [ 13.320515] ubuntu systemd[1]: Started udev Coldplug all Devices. ... [ 14.405166] ubuntu kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event) [ 14.405280] ubuntu kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event) (skipping the additional ones) 3) Quite some time after the system is up and running, we see repeat registrations which are the result of something invoking udevadm trigger. [ 837.100772] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: device already registered (emitting change event) [ 837.117295] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: device already registered (emitting change event) [ 837.120357] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: device already registered (emitting change event) [ 837.122496] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: device already registered (emitting change event) [ 837.122765] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: device already registered (emitting change event) [ 837.132009] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event) [ 837.160226] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event) [ 837.177862] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/nvme0n1p3: device already registered [ 837.178724] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/nvme0n1p1: device already registered [ 837.195827] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event) [ 837.196084] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: device already registered (emitting change event) [ 837.196107] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sda1: device already registered (emitting change event) [ 837.196351] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: device already registered (emitting change event) [ 837.196601] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: device already registered (emitting change event) [ 837.198407] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: device already registered (emitting change event) [ 837.199699] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: device already registered (emitting change event) [ 837.199733] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event) [ 837.207026] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event) [ 837.208943] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: device already registered (emitting change event) [ 837.209058] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: device already registered (emitting change event) [ 837.209697] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: device already registered (emitting change event) [ 837.209788] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event) [ 837.212004] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: device already registered (emitting change event) [ 837.213403] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: device already registered (emitting change event) [ 841.815986] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdi4: device already registered (emitting change event) [ 841.816737] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdi3: device already registered (emitting change event) I'm not 100% sure at which point ceph/charms make use of the dname symlinks, but ceph appears to be up much earlier than this. $ journalctl -o short-monotonic -D fd6c366f929d47a195590b3c6dc9df5a -u ceph* -- Logs begin at Tue 2019-01-29 14:20:56 UTC, end at Tue 2019-01-29 15:23:10 UTC. -- [ 290.386737] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once. [ 307.411830] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to start/stop all [email protected] instances at once. [ 308.321612] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to start/stop all [email protected] instances at once. [ 309.064386] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to start/stop all [email protected] instances at once. [ 309.911307] ln-sv-ostk01 systemd[1]: Reached target ceph target allowing to start/stop all [email protected] instances at once. 4. Right before each of the udevadm triggers, there is a snap being installed [ 811.249574] ln-sv-ostk01 systemd[1]: Reloading. [ 830.771380] ln-sv-ostk01 snapd[2172]: 2019/01/29 14:34:34.524657 api.go:1046: Installing snap "canonical-livepatch" revision unset [ 835.851602] ln-sv-ostk01 systemd[1]: Reloading. [ 836.044905] ln-sv-ostk01 systemd[1]: Mounting Mount unit for core, revision 6259... [ 836.118601] ln-sv-ostk01 systemd[1]: Mounted Mount unit for core, revision 6259. [ 836.149989] ln-sv-ostk01 snapd[2172]: 2019/01/29 14:34:39.903269 backend.go:303: cannot create host snap-confine apparmor configuration: cannot synchronize snap-confine a [ 836.172115] ln-sv-ostk01 audit[120593]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap-update-ns.core" pid=120593 comm="apparmor_parser" [ 836.296052] ln-sv-ostk01 kernel: audit: type=1400 audit(1548772479.923:175): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap-update-ns.core" pi [ 836.230674] ln-sv-ostk01 audit[120598]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.core.hook.configure" pid=120598 comm="apparmor_pars [ 836.354604] ln-sv-ostk01 kernel: audit: type=1400 audit(1548772479.979:176): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.core.hook.configur [ 837.100772] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: device already registered (emitting change event) I've confirmed with snapd folks, that snapd's udev interface will run: udevadm trigger, which will emit a CHANGE event for all subsystems, including block. And snapd will run udevamd settle --timeout=10 (which waits up to 10 seconds before exiting). This is why we see additional registrations of bcache devices at runtime. >From our udevd debugging experiment, we do know that while udevd is processing the block subsystem (at least for ADD events) the bcache symlinks get removed, until the backing and cache device get bound together in the kernel (via a bcache-register) and then the kernel can emit a CHANGE event with the CACHED_UUID value. For next steps; I believe we can recreate the scenario by constructing a system with the same bcache structure and rules in place, and then issue udevadm trigger and see how many we need to get into a state where the /dev/bcache/by-uuid symlinks are removed and not present. >From there we can see what can be done to resolve missing symlinks if these triggers happen repeatedly (which is likely given snapd behavior). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1812925 Title: No OSDs has been initialized in random unit with "No block devices detected using current configuration" To manage notifications about this bug go to: https://bugs.launchpad.net/charm-ceph-osd/+bug/1812925/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
