Public bug reported:

OSDs access and modify epoch maps without a lock from multiple threads.
This leads a race condition and results in a crash due to iterator
invalidation.

Typical stack trace looks like:

Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: *** Caught signal (Segmentation 
fault) **
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  in thread 7e821b800640 
thread_name:safe_timer
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  1: 
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7e8226442520]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  2: 
(std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x7e82268c65ee]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  3: 
(OSD::tick_without_osd_lock()+0x4ac) [0x5e326eb66c6c]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  4: (Context::complete(int)+0xd) 
[0x5e326eb8971d]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  5: 
(CommonSafeTimer<std::mutex>::timer_thread()+0x12d) [0x5e326f1f64ed]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  6: 
(CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x5e326f1f7991]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  7: 
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7e8226494ac3]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  8: 
/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7e8226526850]
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2025-07-05T01:34:19.417+0000 
7e821b800640 -1 *** Caught signal (Segmentation fault) **
Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  in thread 7e821b800640 
thread_name:safe_timer

This has been fixed in upstream and backport to Squid is in progress.

Upstream bug trcker: https://tracker.ceph.com/issues/66819
Upstream patch: https://github.com/ceph/ceph/pull/62916
Bug tracker for Squid: https://tracker.ceph.com/issues/72070
Backport patch for Squid: https://github.com/ceph/ceph/pull/64732

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: ceph (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Affects: ceph (Ubuntu Plucky)
     Importance: Undecided
         Status: New

** Affects: ceph (Ubuntu Questing)
     Importance: Undecided
         Status: New


** Tags: sts

** Tags added: sts

** Description changed:

  OSDs access and modify epoch maps without a lock from multiple threads.
  This leads a race condition and results in a crash due to iterator
  invalidation.
  
  Typical stack trace looks like:
  
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: *** Caught signal (Segmentation 
fault) **
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  in thread 7e821b800640 
thread_name:safe_timer
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  ceph version 19.2.0 
(16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  1: 
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7e8226442520]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  2: 
(std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x7e82268c65ee]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  3: 
(OSD::tick_without_osd_lock()+0x4ac) [0x5e326eb66c6c]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  4: 
(Context::complete(int)+0xd) [0x5e326eb8971d]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  5: 
(CommonSafeTimer<std::mutex>::timer_thread()+0x12d) [0x5e326f1f64ed]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  6: 
(CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x5e326f1f7991]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  7: 
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7e8226494ac3]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  8: 
/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7e8226526850]
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]: 2025-07-05T01:34:19.417+0000 
7e821b800640 -1 *** Caught signal (Segmentation fault) **
  Jul 05 01:34:19 ps7-ra1-n1 ceph-osd[2083916]:  in thread 7e821b800640 
thread_name:safe_timer
  
  This has been fixed in upstream and backport to Squid is in progress.
  
- upstream bug trcker: https://tracker.ceph.com/issues/66819
- upstream patch: https://github.com/ceph/ceph/pull/62916
+ Upstream bug trcker: https://tracker.ceph.com/issues/66819
+ Upstream patch: https://github.com/ceph/ceph/pull/62916
  Bug tracker for Squid: https://tracker.ceph.com/issues/72070
  Backport patch for Squid: https://github.com/ceph/ceph/pull/64732

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2121931

Title:
  OSD crash in OSD::tick_without_osd_lock() due to race condition

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2121931/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to