[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-07-21 Thread Mauricio Faria de Oliveira
This was actually fixed in v14.2.22 [1].

"""
This is the 22nd and likely the last backport release in the Nautilus series. 
Ultimately, we recommend all users upgrade to newer Ceph releases.
...
rgw: beast frontend uses 512k mprotected coroutine stacks (pr#39947, Yaakov 
Selkowitz, Mauricio Faria de Oliveira, Daniel Gryniewicz, Casey Bodley)
"""

[1] https://docs.ceph.com/en/latest/releases/nautilus/#v14-2-22-nautilus

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-06-21 Thread Mauricio Faria de Oliveira
It seems unlikely to be another Nautilus release
as the ceph releases index page [1] shows it EOL
as of 2021-06-01 (~3 weeks ago.)

Thus closing this bug.

https://docs.ceph.com/en/latest/releases/

** Changed in: cloud-archive/train
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-05-14 Thread Mauricio Faria de Oliveira
Another ceph hotfix release (v14.2.21) has pushed this one release
further, thus at least v14.2.22.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-04-21 Thread Mauricio Faria de Oliveira
Another ceph bugfix release (v14.2.20) has pushed this one release
further, thus at least v14.2.21.

** Description changed:

  [Impact]
  
  The radosgw beast frontend in ceph nautilus might hit coroutine stack
  corruption on startup and requests.
  
  This is usually observed right at the startup of the ceph-radosgw systemd 
unit; sometimes 1 minute later.
  But it might occur any time handling requests, depending on 
coroutine/request's function path/stack size.
  
  The symptoms are usually a crash with stack trace listing TCMalloc 
(de)allocate/release to central cache,
  but less rare signs are large allocs in the _terabytes_ range (pointer to 
stack used as allocation size)
  and stack traces showing function return addresses (RIP) that are actually 
pointers to an stack address.
  
  This is not widely hit in Ubuntu as most deployments use the ceph-radosgw 
charm that hardcodes 'civetweb'
  as rgw frontend, which is _not_ affected; custom/cephadm deployments that 
choose 'beast' might hit this.
  
    @ charm-ceph-radosgw/templates/ceph.conf
  rgw frontends = civetweb port={{ port }}
  
  Let's report this LP bug for documentation and tracking purposes until
  UCA gets the fixes.
  
  [Fix]
  
  This has been reported by an Ubuntu Advantage user, and another user in ceph 
tracker #47910 [1].
  This had been reported and fixed in Octopus [2] (confirmed by UA user; no 
longer affected.)
  
  The Nautilus backport has recently been merged [3, 4] and should be
- available in v14.2.20.
+ available in v14.2.21.
  
  [Test Case]
  
  The conditions to trigger the bug aren't clear, but apparently related to EC 
pools w/ very large buckets,
  and of course the radosgw frontend beast being enabled (civetweb is not 
affected.)
  
  [Where problems could occur]
  
  The fixes are restricted to the beast frontend, specifically to the 
coroutines used to handle requests.
  So problems would probably be seen in request handling only with the beast 
frontend.
  Workarounds thus include switching back to the civetweb frontend.
  
  This changes core/base parts of the RGW beast frontend code, but are in place 
from Octopus released.
  The other user/reporter in the ceph tracker has been using the patches for 
weeks with no regression;
  the ceph tests have passed and likely serious issues would be caught by ceph 
CI upstream.
  
  [1] https://tracker.ceph.com/issues/47910 report tracker (nautilus)
  [2] https://tracker.ceph.com/issues/43739 master tracker (octopus)
  [3] https://tracker.ceph.com/issues/43921 backport tracker (nautilus)
  [4] https://github.com/ceph/ceph/pull/39947 github PR

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-04-05 Thread Mauricio Faria de Oliveira
** Description changed:

  [Impact]
  
  The radosgw beast frontend in ceph nautilus might hit coroutine stack
  corruption on startup and requests.
  
  This is usually observed right at the startup of the ceph-radosgw systemd 
unit; sometimes 1 minute later.
  But it might occur any time handling requests, depending on 
coroutine/request's function path/stack size.
  
  The symptoms are usually a crash with stack trace listing TCMalloc 
(de)allocate/release to central cache,
  but less rare signs are large allocs in the _terabytes_ range (pointer to 
stack used as allocation size)
  and stack traces showing function return addresses (RIP) that are actually 
pointers to an stack address.
  
  This is not widely hit in Ubuntu as most deployments use the ceph-radosgw 
charm that hardcodes 'civetweb'
  as rgw frontend, which is _not_ affected; custom/cephadm deployments that 
choose 'beast' might hit this.
  
-   @ charm-ceph-radosgw/templates/ceph.conf
- rgw frontends = civetweb port={{ port }}
+   @ charm-ceph-radosgw/templates/ceph.conf
+ rgw frontends = civetweb port={{ port }}
  
  Let's report this LP bug for documentation and tracking purposes until
  UCA gets the fixes.
  
  [Fix]
  
  This has been reported by an Ubuntu Advantage user, and another user in ceph 
tracker #47910 [1].
  This had been reported and fixed in Octopus [2] (confirmed by UA user; no 
longer affected.)
  
  The Nautilus backport has recently been merged [3, 4] and should be
- available in v14.2.19.
+ available in v14.2.20.
  
  [Test Case]
  
  The conditions to trigger the bug aren't clear, but apparently related to EC 
pools w/ very large buckets,
  and of course the radosgw frontend beast being enabled (civetweb is not 
affected.)
  
  [Where problems could occur]
  
  The fixes are restricted to the beast frontend, specifically to the 
coroutines used to handle requests.
  So problems would probably be seen in request handling only with the beast 
frontend.
  Workarounds thus include switching back to the civetweb frontend.
  
  This changes core/base parts of the RGW beast frontend code, but are in place 
from Octopus released.
  The other user/reporter in the ceph tracker has been using the patches for 
weeks with no regression;
  the ceph tests have passed and likely serious issues would be caught by ceph 
CI upstream.
  
  [1] https://tracker.ceph.com/issues/47910 report tracker (nautilus)
  [2] https://tracker.ceph.com/issues/43739 master tracker (octopus)
  [3] https://tracker.ceph.com/issues/43921 backport tracker (nautilus)
  [4] https://github.com/ceph/ceph/pull/39947 github PR

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
Comments #1- #6: example stack traces and GDB debug snippets of the
issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #6

RIP address in stack is actually not an instruction address, but a stack 
address (corrupted stack).
GDB searched hard for stack unwinding, and got a very long trace, that might 
not be correct.

Oct 23 16:41:27 HOSTNAME radosgw[3572]: *** Caught signal (Segmentation 
fault) **
Oct 23 16:41:27 HOSTNAME radosgw[3572]:  in thread 7f826a190700 
thread_name:radosgw
Oct 23 16:41:27 HOSTNAME radosgw[3572]:  ceph version 14.2.11 
(f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
Oct 23 16:41:27 HOSTNAME radosgw[3572]:  1: (()+0x128a0) 
[0x7f82987d68a0]
Oct 23 16:41:27 HOSTNAME radosgw[3572]:  2: [0x56275c77ac20]
Oct 23 16:41:27 HOSTNAME radosgw[3572]: 2020-10-23 16:41:27.433 
7f826a190700 -1 *** Caught signal (Segmentation fault) **

The RIP is in the stack address range from the previous frame:

(gdb) frame 5
#5  0x7f829a26e83d in AsyncConnection::send_message 
(this=0x56275c71ad00, m=0x56275c76f600) at 
./src/msg/async/AsyncConnection.cc:548
(gdb) info reg $rbp
rbp0x56275c76f600  0x56275c76f600
(gdb) info reg $rsp
rsp0x56275c782480  0x56275c782480

#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x562758ea16b0 in reraise_fatal (signum=11) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at 
./src/global/signal_handler.cc:326
#3  
#4  0x56275c77ac20 in ?? ()
#5  0x7f829a26e83d in AsyncConnection::send_message 
(this=0x56275c71ad00, m=0x56275c76f600) at 
./src/msg/async/AsyncConnection.cc:548
#6  0x7f82a32ee135 in Objecter::_send_op 
(this=this@entry=0x56275bc35080, op=op@entry=0x56275c76f000) at 
./src/osdc/Objecter.cc:3274
#7  0x7f82a32f0433 in Objecter::_op_submit 
(this=this@entry=0x56275bc35080, op=op@entry=0x56275c76f000, sul=..., 
ptid=ptid@entry=0x56275c7827e8) at ./src/osdc/Objecter.cc:2456
#8  0x7f82a32fb43d in Objecter::_op_submit_with_budget 
(this=this@entry=0x56275bc35080, op=op@entry=0x56275c76f000, sul=..., 
ptid=ptid@entry=0x56275c7827e8, ctx_budget=ctx_budget@entry=0x0) at 
./src/osdc/Objecter.cc:2284
#9  0x7f82a32fb680 in Objecter::op_submit (this=0x56275bc35080, 
op=0x56275c76f000, ptid=0x56275c7827e8, ctx_budget=0x0) at 
./src/osdc/Objecter.cc:2251
#10 0x7f82a32c2e3a in librados::IoCtxImpl::operate_read 
(this=0x56275bd83ee0, oid=..., o=0x56275bdaa180, pbl=pbl@entry=0x0, 
flags=flags@entry=0) at ./src/librados/IoCtxImpl.cc:725
#11 0x7f82a32987ec in librados::v14_2_0::IoCtx::operate 
(this=this@entry=0x56275c782c98, oid=..., o=o@entry=0x56275c782b80, 
pbl=pbl@entry=0x0) at ./src/librados/librados_cxx.cc:1423
#12 0x562759244dbc in rgw_rados_operate (ioctx=..., oid=..., 
op=op@entry=0x56275c782b80, pbl=pbl@entry=0x0, y=...) at 
./src/rgw/rgw_tools.cc:218
#13 0x5627592b7fbf in RGWSI_RADOS::Obj::operate 
(this=this@entry=0x56275c782c10, op=op@entry=0x56275c782b80, pbl=pbl@entry=0x0, 
y=...) at ./src/rgw/services/svc_rados.cc:96
#14 0x562758f33a22 in RGWSI_SysObj_Core::read 
(this=this@entry=0x56275afe7540, obj_ctx=..., read_state=..., 
objv_tracker=objv_tracker@entry=0x56275c783948, obj=..., 
bl=bl@entry=0x56275c783460, ofs=0, end=-1, attrs=0x56275c782dc0, 
raw_attrs=true, cache_info=0x56275c783680) at 
./src/rgw/services/svc_sys_obj_core.cc:222
#15 0x5627592bbe7d in RGWSI_SysObj_Cache::read 
(this=this@entry=0x56275afe7540, obj_ctx=..., read_state=..., 
objv_tracker=0x56275c783948, obj=..., obl=obl@entry=0x56275c783460, ofs=0, 
end=-1, attrs=0x56275c783b90, raw_attrs=false, cache_info=0x56275c783680, 
refresh_version=...) at ./src/rgw/services/svc_sys_obj_cache.cc:147
#16 0x562758f2fbcb in RGWSI_SysObj::Obj::ROp::read 
(this=this@entry=0x56275c783260, ofs=ofs@entry=0, end=end@entry=-1, 
bl=bl@entry=0x56275c783460) at ./src/rgw/services/svc_sys_obj.cc:47
#17 0x5627592426c3 in RGWSI_SysObj::Obj::ROp::read 
(pbl=0x56275c783460, this=0x56275c783260) at ./src/rgw/services/svc_sys_obj.h:99
#18 rgw_get_system_obj (rgwstore=rgwstore@entry=0x56275b073800, 
obj_ctx=..., pool=..., key=..., bl=..., 
objv_tracker=objv_tracker@entry=0x56275c783948, pmtime=0x56275c783b88, 
pattrs=0x56275c783b90, cache_info=0x56275c783680, refresh_version=...) at 
./src/rgw/rgw_tools.cc:156
#19 0x562759198257 in RGWRados::get_bucket_instance_from_oid 
(this=this@entry=0x56275b073800, obj_ctx=..., oid=..., info=..., 
pmtime=pmtime@entry=0x56275c783b88, pattrs=pattrs@entry=0x56275c783b90, 
cache_info=0x56275c783680, refresh_version=...) at ./src/rgw/rgw_rados.cc:8250
#20 0x56275919b5ad in RGWRados::_get_bucket_info 
(this=0x56275b073800, obj_ctx=..., tenant=..., bucket_name=..., info=..., 
pmtime=pmtime@entry=0x0, pattrs=0x56275c785fb8, refresh_version=...) at 
./src/rgw/rgw_rados.cc:8405
   

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #5

Oct 23 16:41:01 HOSTNAME radosgw[1616]: tcmalloc: large alloc 
94195528343552 bytes == (nil) @  0x7f97ec494887 0x7f97ec1cb1b2 0x7f97ec1ff948 
0x7f97ec1d08be 0x7f97ec1db01e 0x7f97ec1dd433 0x7f97ec1e843d 0x7f97ec1e8680 
0x7f97ec1afe3a 0x7f97ec1857ec 0x55ab96ae9dbc 0x55ab967d8a22 0x55ab96b60e7d 
0x55ab967d4bcb 0x55ab96ae76c3 0x55ab96af598a 0x55ab967c5a1a 0x55ab9667e1cf 
0x55ab96801118 0x55ab96801c87 0x55ab96766a0b 0x55ab966bd660 0x55ab966be86d 
0x55ab96bfad5f
Oct 23 16:41:01 HOSTNAME radosgw[1616]: terminate called without an 
active exception
Oct 23 16:41:01 HOSTNAME radosgw[1616]: *** Caught signal (Aborted) **

#0  raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x55ab967466b0 in reraise_fatal (signum=6) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=6) at ./src/global/signal_handler.cc:326
#3  
#4  __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:51
#5  0x7f97e09c18b1 in __GI_abort () at abort.c:79
#6  0x7f97e13b4957 in __gnu_cxx::__verbose_terminate_handler () at 
../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#7  0x7f97e13baae6 in __cxxabiv1::__terminate (handler=) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:47
#8  0x7f97e13bab21 in std::terminate () at 
../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:57
#9  0x55ab96801eb0 in rgw::auth::Strategy::apply 
(dpp=0x55ab9a627000, auth_strategy=..., s=) at 
./src/rgw/rgw_auth.cc:273
#10 0x55ab96766a0b in process_request (store=0x55ab99917800, 
rest=0x7fff342285a0, req=0x55ab9b020930, frontend_prefix=..., 
auth_registry=..., client_io=client_io@entry=0x55ab9b0209c0, olog=0x0, 
yield=..., scheduler=0x55ab9ac1f928, http_ret=0x0) at 
./src/rgw/rgw_process.cc:251
#11 0x55ab966bd660 in (anonymous 
namespace)::handle_connection
 > (context=..., env=..., stream=..., buffer=..., pause_mutex=..., 
scheduler=, ec=..., yield=..., is_ssl=false) at 
./src/rgw/rgw_asio_frontend.cc:167
#12 0x55ab966be86d in (anonymous 
namespace)::AsioFrontendoperator() 
(yield=..., __closure=0x55ab9a8aa1e8) at ./src/rgw/rgw_asio_frontend.cc:638
#13 
boost::asio::detail::coro_entry_point >, (anonymous 
namespace)::AsioFrontend::accept((anonymous 
namespace)::AsioFrontend::Listener&, 
boost::system::error_code):: >::operator() 
(ca=..., this=) at 
./obj-x86_64-linux-gnu/boost/include/boost/asio/impl/spawn.hpp:337
#14 
boost::coroutines::detail::push_coroutine_object,
 void, boost::asio::detail::coro_entry_point >, 
(anonymous namespace)::AsioFrontend::accept((anonymous 
namespace)::AsioFrontend::Listener&, 
boost::system::error_code):: >&, 
boost::coroutines::basic_standard_stack_allocator
 >::run (this=0x55ab9b021f60) at 
./obj-x86_64-linux-gnu/boost/include/boost/coroutine/detail/push_coroutine_object.hpp:302
#15 
boost::coroutines::detail::trampoline_push_void,
 void, boost::asio::detail::coro_entry_point >, 
(anonymous namespace)::AsioFrontend::accept((anonymous 
namespace)::AsioFrontend::Listener&, 
boost::system::error_code):: >&, 
boost::coroutines::basic_standard_stack_allocator
 > >(boost::context::detail::transfer_t) (t=...) at 
./obj-x86_64-linux-gnu/boost/include/boost/coroutine/detail/trampoline_push.hpp:70
#16 0x55ab96bfad5f in make_fcontext ()
#17 0x55ab9703fcd0 in vtable for 
boost::coroutines::detail::push_coroutine_object,
 void, boost::asio::detail::coro_entry_point >, 
(anonymous namespace)::AsioFrontend::accept((anonymous 
namespace)::AsioFrontend::Listener&, 
boost::system::error_code)::{lambda(boost::asio::basic_yield_context >)#4}>&, 
boost::coroutines::basic_standard_stack_allocator
 > ()
#18 0x0026 in ?? ()
#19 0x in ?? ()


The ceph error message provides some more stack frames than GDB.

(gdb) info symbol 

0x7f97ec494887 tc_newarray + 455 in section google_malloc of 
/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
0x7f97ec1cb1b2 void std::__cxx11::basic_string, std::allocator >::_M_construct(char*, 
char*, std::forward_iterator_tag)
0x7f97ec1ff948 std::vector 
>::operator=(std::vector > const&)
0x7f97ec1d08be Objecter::_prepare_osd_op
0x7f97ec1db01e Objecter::_send_op
0x7f97ec1dd433 Objecter::_op_submit
0x7f97ec1e843d Objecter::_op_submit_with_budget
0x7f97ec1e8680 Objecter::op_submit
0x7f97ec1afe3a librados::IoCtxImpl::operate_read
0x7f97ec1857ec librados::v14_2_0::IoCtx::operate
0x55ab96ae9dbc rgw_rados_operate
0x55ab967d8a22 RGWSI_SysObj_Core::read
0x55ab96b60e7d RGWSI_SysObj_Cache::read
0x55ab967d4bcb RGWSI_SysObj::Obj::ROp::read
0x55ab96ae76c3 rgw_get_system_obj
0x55ab96af598a rgw_get_user_info_from_index
0x55ab967c5a1a 

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #4

Shorter stack trace reported in ceph logs than in GDB.

Oct 23 16:41:28 HOSTNAME radosgw[4319]: *** Caught signal (Segmentation 
fault) **
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  in thread 7fb79e999700 
thread_name:msgr-worker-2
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  ceph version 14.2.11 
(f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  1: (()+0x128a0) 
[0x7fb7a747d8a0]
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  2: 
(tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int)+0xdb) [0x7fb7b223dbcb]
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  3: 
(tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
long)+0x1b) [0x7fb7b223dc9b]
Oct 23 16:41:28 HOSTNAME radosgw[4319]:  4: (cfree()+0x2d5) 
[0x7fb7b224c6f5]


#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x5565a2ff16b0 in reraise_fatal (signum=11) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at 
./src/global/signal_handler.cc:326
#3  
#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=158, head=0x5565a3cd8bf0) at src/linked_list.h:76
#6  tcmalloc::ThreadCache::FreeList::PopRange (end=, 
start=, N=158, this=0x5565a3cd8bf0) at src/thread_cache.h:225
#7  tcmalloc::ThreadCache::ReleaseToCentralCache 
(this=this@entry=0x5565a3cd8a40, src=src@entry=0x5565a3cd8bf0, cl=, N=158, N@entry=273) at src/thread_cache.cc:195
#8  0x7fb7b223dc9b in tcmalloc::ThreadCache::ListTooLong 
(this=this@entry=0x5565a3cd8a40, list=0x5565a3cd8bf0, cl=) at 
src/thread_cache.cc:157
#9  0x7fb7b224c6f5 in tcmalloc::ThreadCache::Deallocate 
(cl=, ptr=0x5565a57f5c00, this=0x5565a3cd8a40) at 
src/thread_cache.h:387
#10 (anonymous namespace)::do_free_helper 
(invalid_free_fn=0x7fb7b222cce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, heap_must_be_valid=true, heap=0x5565a3cd8a40, 
ptr=0x5565a57f5c00) at src/tcmalloc.cc:1305
#11 (anonymous namespace)::do_free_with_callback 
(invalid_free_fn=0x7fb7b222cce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, ptr=0x5565a57f5c00) at src/tcmalloc.cc:1337
#12 (anonymous namespace)::do_free (ptr=0x5565a57f5c00) at 
src/tcmalloc.cc:1345
#13 tc_free (ptr=0x5565a57f5c00) at src/tcmalloc.cc:1610
#14 0x7fb7b1fca164 in __gnu_cxx::new_allocator::deallocate 
(this=0x5565a5bf0880, __p=) at 
/usr/include/c++/7/ext/new_allocator.h:125
#15 std::allocator_traits >::deallocate (__a=..., 
__n=, __p=) at 
/usr/include/c++/7/bits/alloc_traits.h:462
#16 std::_Vector_base >::_M_deallocate 
(this=0x5565a5bf0880, __n=, __p=) at 
/usr/include/c++/7/bits/stl_vector.h:180
#17 std::_Vector_base >::~_Vector_base 
(this=0x5565a5bf0880, __in_chrg=) at 
/usr/include/c++/7/bits/stl_vector.h:162
#18 std::vector >::~vector 
(this=0x5565a5bf0880, __in_chrg=) at 
/usr/include/c++/7/bits/stl_vector.h:435
#19 MOSDOp::~MOSDOp (this=0x5565a5bf0600, __in_chrg=) at 
./src/messages/MOSDOp.h:195
#20 MOSDOp::~MOSDOp (this=0x5565a5bf0600, __in_chrg=) at 
./src/messages/MOSDOp.h:195
#21 0x7fb7a8ca6db7 in RefCountedObject::put (this=0x5565a5bf0600) 
at ./src/common/RefCountedObj.h:64
#22 0x7fb7a8f42d30 in ProtocolV2::write_message 
(this=this@entry=0x5565a5776000, m=m@entry=0x5565a5bf0600, 
more=more@entry=false) at ./src/msg/async/ProtocolV2.cc:571
#23 0x7fb7a8f56f0b in ProtocolV2::write_event (this=0x5565a5776000) 
at ./src/msg/async/ProtocolV2.cc:658
#24 0x7fb7a8f16263 in AsyncConnection::handle_write 
(this=0x5565a5763b00) at ./src/msg/async/AsyncConnection.cc:692
#25 0x7fb7a8f6a757 in EventCenter::process_events 
(this=this@entry=0x5565a43f2e00, timeout_microseconds=, 
timeout_microseconds@entry=3000, 
working_dur=working_dur@entry=0x7fb79e996be8) at ./src/msg/async/Event.cc:441
#26 0x7fb7a8f6ee48 in NetworkStackoperator() 
(__closure=0x5565a44c3958) at ./src/msg/async/Stack.cc:53
#27 std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...) at 
/usr/include/c++/7/bits/std_function.h:316
#28 0x7fb7a719f6df in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x7fb7a74726db in start_thread (arg=0x7fb79e999700) at 
pthread_create.c:463
#30 0x7fb7a685ca3f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Again, interaction between SLL_Pop(Range) and SLL_Next.

#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=158, head=0x5565a3cd8bf0) at src/linked_list.h:76

Same as previous 2 cases, same function/instruction/register/pointer:

(gdb) f 4


[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #3

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x55bd9a04f6b0 in reraise_fatal (signum=11) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at 
./src/global/signal_handler.cc:326
#3  
#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=176, head=0x55bd9c1d2bf0) at src/linked_list.h:76
#6  tcmalloc::ThreadCache::FreeList::PopRange (end=, 
start=, N=176, this=0x55bd9c1d2bf0) at src/thread_cache.h:225
#7  tcmalloc::ThreadCache::ReleaseToCentralCache 
(this=this@entry=0x55bd9c1d2a40, src=src@entry=0x55bd9c1d2bf0, cl=, N=176, N@entry=273) at src/thread_cache.cc:195
#8  0x7f036eed4c9b in tcmalloc::ThreadCache::ListTooLong 
(this=this@entry=0x55bd9c1d2a40, list=0x55bd9c1d2bf0, cl=) at 
src/thread_cache.cc:157
#9  0x7f036eee36f5 in tcmalloc::ThreadCache::Deallocate 
(cl=, ptr=0x55bd9e12e0f0, this=0x55bd9c1d2a40) at 
src/thread_cache.h:387
#10 (anonymous namespace)::do_free_helper 
(invalid_free_fn=0x7f036eec3ce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, heap_must_be_valid=true, heap=0x55bd9c1d2a40, 
ptr=0x55bd9e12e0f0) at src/tcmalloc.cc:1305
#11 (anonymous namespace)::do_free_with_callback 
(invalid_free_fn=0x7f036eec3ce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, ptr=0x55bd9e12e0f0) at src/tcmalloc.cc:1337
#12 (anonymous namespace)::do_free (ptr=0x55bd9e12e0f0) at 
src/tcmalloc.cc:1345
#13 tc_free (ptr=0x55bd9e12e0f0) at src/tcmalloc.cc:1610
#14 0x7f036ec61164 in __gnu_cxx::new_allocator::deallocate 
(this=0x55bd9c982c80, __p=) at 
/usr/include/c++/7/ext/new_allocator.h:125
#15 std::allocator_traits >::deallocate (__a=..., 
__n=, __p=) at 
/usr/include/c++/7/bits/alloc_traits.h:462
#16 std::_Vector_base >::_M_deallocate 
(this=0x55bd9c982c80, __n=, __p=) at 
/usr/include/c++/7/bits/stl_vector.h:180
#17 std::_Vector_base >::~_Vector_base 
(this=0x55bd9c982c80, __in_chrg=) at 
/usr/include/c++/7/bits/stl_vector.h:162
#18 std::vector >::~vector 
(this=0x55bd9c982c80, __in_chrg=) at 
/usr/include/c++/7/bits/stl_vector.h:435
#19 MOSDOp::~MOSDOp (this=0x55bd9c982a00, __in_chrg=) at 
./src/messages/MOSDOp.h:195
#20 MOSDOp::~MOSDOp (this=0x55bd9c982a00, __in_chrg=) at 
./src/messages/MOSDOp.h:195
#21 0x7f036593ddb7 in RefCountedObject::put (this=0x55bd9c982a00) 
at ./src/common/RefCountedObj.h:64
#22 0x7f0365bd9d30 in ProtocolV2::write_message 
(this=this@entry=0x55bd9dc68000, m=m@entry=0x55bd9c982a00, 
more=more@entry=false) at ./src/msg/async/ProtocolV2.cc:571
#23 0x7f0365bedf0b in ProtocolV2::write_event (this=0x55bd9dc68000) 
at ./src/msg/async/ProtocolV2.cc:658
#24 0x7f0365bad263 in AsyncConnection::handle_write 
(this=0x55bd9dc58900) at ./src/msg/async/AsyncConnection.cc:692
#25 0x7f0365c01757 in EventCenter::process_events 
(this=this@entry=0x55bd9c8ece00, timeout_microseconds=, 
timeout_microseconds@entry=3000, 
working_dur=working_dur@entry=0x7f035b62dbe8) at ./src/msg/async/Event.cc:441
#26 0x7f0365c05e48 in NetworkStackoperator() 
(__closure=0x55bd9c9bd958) at ./src/msg/async/Stack.cc:53
#27 std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...) at 
/usr/include/c++/7/bits/std_function.h:316
#28 0x7f0363e366df in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x7f03641096db in start_thread (arg=0x7f035b630700) at 
pthread_create.c:463
#30 0x7f03634f3a3f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Again, interaction between SLL_Pop(Range) and SLL_Next.

#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=176, head=0x55bd9c1d2bf0) at src/linked_list.h:76

Same as previous case, same function/instruction/register/pointer:

(gdb) f 4

(gdb) x/i $rip
=> 0x7f036eed4bcb 
: mov(%rdx),%rdx

(gdb) x $rdx
   0x0: Cannot access memory at address 0x0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #1

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x555f023cf6b0 in reraise_fatal (signum=11) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at 
./src/global/signal_handler.cc:326
#3  
#4  tcmalloc::SLL_Next (t=0x6ee6555f) at src/linked_list.h:45
#5  tcmalloc::SLL_Pop (list=0x555f03530628) at src/linked_list.h:59
#6  tcmalloc::ThreadCache::FreeList::Pop (this=) at 
src/thread_cache.h:212
#7  tcmalloc::ThreadCache::Allocate (cl=, 
size=, this=) at src/thread_cache.h:365
#8  (anonymous namespace)::do_memalign (align=align@entry=8, 
size=, size@entry=4096) at src/tcmalloc.cc:1462
#9  0x7f1d87d08379 in (anonymous 
namespace)::do_memalign_or_cpp_memalign (size=4096, align=8) at 
src/tcmalloc.cc:1131
#10 tc_posix_memalign (result_ptr=result_ptr@entry=0x7f1d74c51310, 
align=align@entry=8, size=size@entry=4096) at src/tcmalloc.cc:1781
#11 0x7f1d7ea60a86 in ceph::buffer::v14_2_0::raw_combined::create 
(mempool=10, align=8, len=4000) at ./src/common/buffer.cc:121
#12 ceph::buffer::v14_2_0::list::refill_append_space 
(this=this@entry=0x7f1d74c51480, len=len@entry=1) at ./src/common/buffer.cc:1442
#13 0x7f1d7ea6197a in ceph::buffer::v14_2_0::list::append 
(this=0x7f1d74c51480, data=0x7f1d74c513e0 "\b\024\305t\035\177", 
len=) at ./src/common/buffer.cc:1470
#14 0x7f1d7ea02f2c in ceph::encode_raw (bl=..., 
t=@0x7f1d74c513e0: 8 '\b') at ./src/include/encoding.h:73
#15 ceph::encode (features=720575940647714820, bl=..., 
v=@0x7f1d74c513e0: 8 '\b') at ./src/include/encoding.h:85
#16 ceph::msgr::v2::ControlFrame::_encode_payload_each (
t=@0x7f1d74c513e0: 8 '\b', this=0x7f1d74c51480) at 
./src/msg/async/frames_v2.h:426
#17 ceph::msgr::v2::ControlFrame::_encode (args#1=..., 
args#0=@0x7f1d74c513e0: 8 '\b', this=0x7f1d74c51480) at 
./src/msg/async/frames_v2.h:460
#18 ceph::msgr::v2::ControlFrame::Encode (args#1=..., 
args#0=@0x7f1d74c513e0: 8 '\b', this=) at 
./src/msg/async/frames_v2.h:471
#19 ProtocolV2::_handle_peer_banner_payload (this=0x555f054c, 
buffer=..., r=) at ./src/msg/async/ProtocolV2.cc:942
#20 0x7f1d7ea011a4 in ProtocolV2::run_continuation 
(this=0x555f054c, continuation=...) at ./src/msg/async/ProtocolV2.cc:47
#21 0x7f1d7e9ce0e6 in std::function::operator()(char*, long) const (__args#1=, 
__args#0=, this=0x555f054b8410) at 
/usr/include/c++/7/bits/std_function.h:706
#22 AsyncConnection::process (this=0x555f054b8000) at 
./src/msg/async/AsyncConnection.cc:450
#23 0x7f1d7ea241cd in EventCenter::process_events 
(this=this@entry=0x555f03c4a980, timeout_microseconds=, 
timeout_microseconds@entry=3000, 
working_dur=working_dur@entry=0x7f1d74c51be8) at ./src/msg/async/Event.cc:415
#24 0x7f1d7ea28e48 in NetworkStackoperator() 
(__closure=0x555f03d1b988) at ./src/msg/async/Stack.cc:53
#25 std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/7/bits/std_function.h:316
#26 0x7f1d7cc596df in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#27 0x7f1d7cf2c6db in start_thread (arg=0x7f1d74c54700) at 
pthread_create.c:463
#28 0x7f1d7c316a3f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

See 
#4  tcmalloc::SLL_Next (t=0x6ee6555f) at src/linked_list.h:45
#5  tcmalloc::SLL_Pop (list=0x555f03530628) at src/linked_list.h:59

The pointer in frame 4 is bogus,

(gdb) x 0x555f03530628
0x555f03530628: 0x555f

(gdb) x 0x6ee6555f
0x6ee6555f: Cannot access memory at address 
0x6ee6555f

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921749

Title:
  nautilus: ceph radosgw beast frontend coroutine stack corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1921749/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921749] Re: nautilus: ceph radosgw beast frontend coroutine stack corruption

2021-03-29 Thread Mauricio Faria de Oliveira
coredump #2

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x557c23db26b0 in reraise_fatal (signum=11) at 
./src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=11) at 
./src/global/signal_handler.cc:326
#3  
#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=27, head=0x557c26146d28) at src/linked_list.h:76
#6  tcmalloc::ThreadCache::FreeList::PopRange (end=, 
start=, N=27, this=0x557c26146d28) at src/thread_cache.h:225
#7  tcmalloc::ThreadCache::ReleaseToCentralCache 
(this=this@entry=0x557c26146a40, src=src@entry=0x557c26146d28, cl=, N=27, N@entry=85) at src/thread_cache.cc:195
#8  0x7feb0741dc9b in tcmalloc::ThreadCache::ListTooLong 
(this=this@entry=0x557c26146a40, list=0x557c26146d28, cl=) at 
src/thread_cache.cc:157
#9  0x7feb0742c6f5 in tcmalloc::ThreadCache::Deallocate 
(cl=, ptr=0x557c280c7800, this=0x557c26146a40) at 
src/thread_cache.h:387
#10 (anonymous namespace)::do_free_helper 
(invalid_free_fn=0x7feb0740cce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, heap_must_be_valid=true, heap=0x557c26146a40, 
ptr=0x557c280c7800) at src/tcmalloc.cc:1305
#11 (anonymous namespace)::do_free_with_callback 
(invalid_free_fn=0x7feb0740cce0 <(anonymous namespace)::InvalidFree(void*)>, 
size_hint=0, use_hint=false, ptr=0x557c280c7800) at src/tcmalloc.cc:1337
#12 (anonymous namespace)::do_free (ptr=0x557c280c7800) at 
src/tcmalloc.cc:1345
#13 tc_free (ptr=0x557c280c7800) at src/tcmalloc.cc:1610
#14 0x7feb07151027 in RefCountedObject::put (this=0x557c280c7800) 
at ./src/common/RefCountedObj.h:64
#15 0x7feb07185a7a in Objecter::_finish_op 
(this=this@entry=0x557c2754f080, op=op@entry=0x557c280c7800, r=r@entry=0) at 
./src/osdc/Objecter.cc:3147
#16 0x7feb0718e50f in Objecter::handle_osd_op_reply 
(this=this@entry=0x557c2754f080, m=m@entry=0x557c280ae580) at 
./src/osdc/Objecter.cc:3528
#17 0x7feb0718f733 in Objecter::ms_dispatch (this=0x557c2754f080, 
m=0x557c280ae580) at ./src/osdc/Objecter.cc:966
#18 0x7feb071aa322 in non-virtual thunk to 
Objecter::ms_fast_dispatch(Message*) () at ./src/osdc/Objecter.h:2110
#19 0x7feafe03aa7a in Messenger::ms_fast_dispatch (m=..., 
this=) at ./src/msg/Messenger.h:665
#20 DispatchQueue::fast_dispatch (this=0x557c26970c58, m=...) at 
./src/msg/DispatchQueue.cc:72
#21 0x7feafe12b432 in DispatchQueue::fast_dispatch 
(m=0x557c280ae580, this=) at ./src/msg/DispatchQueue.h:204
#22 ProtocolV2::handle_message (this=this@entry=0x557c27bf5100) at 
./src/msg/async/ProtocolV2.cc:1462
#23 0x7feafe13d1f0 in ProtocolV2::handle_read_frame_dispatch 
(this=this@entry=0x557c27bf5100) at ./src/msg/async/ProtocolV2.cc:1128
#24 0x7feafe13d349 in ProtocolV2::_handle_read_frame_epilogue_main 
(this=this@entry=0x557c27bf5100) at ./src/msg/async/ProtocolV2.cc:1316
#25 0x7feafe13ec29 in ProtocolV2::handle_read_frame_epilogue_main 
(this=0x557c27bf5100, buffer=..., r=) at 
./src/msg/async/ProtocolV2.cc:1291
#26 0x7feafe1271a4 in ProtocolV2::run_continuation 
(this=0x557c27bf5100, continuation=...) at ./src/msg/async/ProtocolV2.cc:47
#27 0x7feafe0f40e6 in std::function::operator()(char*, long) const (__args#1=, 
__args#0=, this=0x557c27be1e90) at 
/usr/include/c++/7/bits/std_function.h:706
#28 AsyncConnection::process (this=0x557c27be1a80) at 
./src/msg/async/AsyncConnection.cc:450
#29 0x7feafe14a1cd in EventCenter::process_events 
(this=this@entry=0x557c26860e00, timeout_microseconds=, 
timeout_microseconds@entry=3000, 
working_dur=working_dur@entry=0x7feaf3b76be8) at ./src/msg/async/Event.cc:415
#30 0x7feafe14ee48 in NetworkStackoperator() 
(__closure=0x557c26931958) at ./src/msg/async/Stack.cc:53
#31 std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...) at 
/usr/include/c++/7/bits/std_function.h:316
#32 0x7feafc37f6df in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#33 0x7feafc6526db in start_thread (arg=0x7feaf3b79700) at 
pthread_create.c:463
#34 0x7feafba3ca3f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Similar interaction between SLL_Pop(Range) and SLL_Next.

#4  tcmalloc::SLL_Next (t=0x0) at src/linked_list.h:45
#5  tcmalloc::SLL_PopRange (end=, start=, N=27, head=0x557c26146d28) at 

The instructions reads memory pointed by register RDX (into RDX itself)

(gdb) f 4

(gdb) x/i $rip
=> 0x7feb0741dbcb 
: mov(%rdx),%rdx

But RDX is NULL, invalid pointer to begin with.

(gdb) x $rdx
   0x0: Cannot access memory at address 0x0

-- 
You received this bug notification because you are a member of Ubuntu