Public bug reported: OSDs access and modify epoch maps without a lock from multiple threads. This leads a race condition and results in a crash due to iterator invalidation.
Typical stack trace looks like: Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: *** Caught signal (Segmentation fault) ** Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: in thread 7e821b800640 thread_name:safe_timer Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable) Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7e8226442520] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x7e82268c65ee] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 3: (OSD::tick_without_osd_lock()+0x4ac) [0x5e326eb66c6c] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 4: (Context::complete(int)+0xd) [0x5e326eb8971d] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 5: (CommonSafeTimer<std::mutex>::timer_thread()+0x12d) [0x5e326f1f64ed] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x5e326f1f7991] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7e8226494ac3] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 8: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7e8226526850] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2025-07-05T01:34:19.417+0000 7e821b800640 -1 *** Caught signal (Segmentation fault) ** Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: in thread 7e821b800640 thread_name:safe_timer This has been fixed in upstream and backport to Squid is in progress. Upstream bug trcker: https://tracker.ceph.com/issues/66819 Upstream patch: https://github.com/ceph/ceph/pull/62916 Bug tracker for Squid: https://tracker.ceph.com/issues/72070 Backport patch for Squid: https://github.com/ceph/ceph/pull/64732 ** Affects: ceph (Ubuntu) Importance: Undecided Status: New ** Affects: ceph (Ubuntu Noble) Importance: Undecided Status: New ** Affects: ceph (Ubuntu Plucky) Importance: Undecided Status: New ** Affects: ceph (Ubuntu Questing) Importance: Undecided Status: New ** Tags: sts ** Tags added: sts ** Description changed: OSDs access and modify epoch maps without a lock from multiple threads. This leads a race condition and results in a crash due to iterator invalidation. Typical stack trace looks like: Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: *** Caught signal (Segmentation fault) ** Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: in thread 7e821b800640 thread_name:safe_timer Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable) Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7e8226442520] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x7e82268c65ee] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 3: (OSD::tick_without_osd_lock()+0x4ac) [0x5e326eb66c6c] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 4: (Context::complete(int)+0xd) [0x5e326eb8971d] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 5: (CommonSafeTimer<std::mutex>::timer_thread()+0x12d) [0x5e326f1f64ed] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x5e326f1f7991] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7e8226494ac3] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 8: /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7e8226526850] Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2025-07-05T01:34:19.417+0000 7e821b800640 -1 *** Caught signal (Segmentation fault) ** Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: in thread 7e821b800640 thread_name:safe_timer This has been fixed in upstream and backport to Squid is in progress. - upstream bug trcker: https://tracker.ceph.com/issues/66819 - upstream patch: https://github.com/ceph/ceph/pull/62916 + Upstream bug trcker: https://tracker.ceph.com/issues/66819 + Upstream patch: https://github.com/ceph/ceph/pull/62916 Bug tracker for Squid: https://tracker.ceph.com/issues/72070 Backport patch for Squid: https://github.com/ceph/ceph/pull/64732 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2121931 Title: OSD crash in OSD::tick_without_osd_lock() due to race condition To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2121931/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
