Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
hi Zheng, On 8/26/19 3:31 PM, Yan, Zheng wrote: […] > change code to : […] we can happily confirm that this resolves the issue. thank you _very_ much & with kind regards, t. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
On Mon, Aug 26, 2019 at 9:25 PM thoralf schulze wrote: > > hi Zheng - > > On 8/26/19 2:55 PM, Yan, Zheng wrote: > > I tracked down the bug > > https://tracker.ceph.com/issues/41434 > > wow, that was quick - thank you for investigating. we are looking > forward for the fix :-) > > in the meantime, is there anything we can do to prevent q == > p->second.end() from happening? something along the lines of > > --- a/src/mds/Locker.cc > +++ b/src/mds/Locker.cc > @@ -1867,6 +1867,8 @@ void Locker::file_update_finish(CInode *in, > MutationRef& mut, unsigned flags, > auto p = in->client_snap_caps.begin(); > while (p != in->client_snap_caps.end()) { >auto q = p->second.find(client); > + // horrible hack for zecm > + std::this_thread::sleep_for(std::chrono::milliseconds(1)); >if (q != p->second.end()) { > SimpleLock *lock = in->get_lock(p->first); > ceph_assert(lock); > > thank you very much & with kind regards, > t. > change code to : if (q != p->second.end()) { } else { ++p; } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
hi Zheng - On 8/26/19 2:55 PM, Yan, Zheng wrote: > I tracked down the bug > https://tracker.ceph.com/issues/41434 wow, that was quick - thank you for investigating. we are looking forward for the fix :-) in the meantime, is there anything we can do to prevent q == p->second.end() from happening? something along the lines of --- a/src/mds/Locker.cc +++ b/src/mds/Locker.cc @@ -1867,6 +1867,8 @@ void Locker::file_update_finish(CInode *in, MutationRef& mut, unsigned flags, auto p = in->client_snap_caps.begin(); while (p != in->client_snap_caps.end()) { auto q = p->second.find(client); + // horrible hack for zecm + std::this_thread::sleep_for(std::chrono::milliseconds(1)); if (q != p->second.end()) { SimpleLock *lock = in->get_lock(p->first); ceph_assert(lock); thank you very much & with kind regards, t. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
On Mon, Aug 26, 2019 at 6:57 PM thoralf schulze wrote: > > hi Zheng, > > On 8/21/19 4:32 AM, Yan, Zheng wrote: > > Please enable debug mds (debug_mds=10), and try reproducing it again. > > please find the logs at > https://www.user.tu-berlin.de/thoralf.schulze/ceph-debug.tar.xz . > > we managed to reproduce the issue as a worst case scenario: before > snapshotting, juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 were the > active mds's and juju-d0f708-3-lxd-1 and juju-d0f708-9-lxd-1 standbys. > we created the snapshot at ~08:11:50, a little later the failover > happened and juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 went mia. a > little later still, the now-active juju-d0f708-3-lxd-1 and > juju-d0f708-9-lxd-1 mds's dropped out of the cluster as well. we started > to restart all mds daemons at ~08:16. > > thank you very much & with kind regards, > t. > I tracked down the bug https://tracker.ceph.com/issues/41434 Thanks Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
hi Zheng, On 8/21/19 4:32 AM, Yan, Zheng wrote: > Please enable debug mds (debug_mds=10), and try reproducing it again. please find the logs at https://www.user.tu-berlin.de/thoralf.schulze/ceph-debug.tar.xz . we managed to reproduce the issue as a worst case scenario: before snapshotting, juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 were the active mds's and juju-d0f708-3-lxd-1 and juju-d0f708-9-lxd-1 standbys. we created the snapshot at ~08:11:50, a little later the failover happened and juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 went mia. a little later still, the now-active juju-d0f708-3-lxd-1 and juju-d0f708-9-lxd-1 mds's dropped out of the cluster as well. we started to restart all mds daemons at ~08:16. thank you very much & with kind regards, t. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
hi zheng, On 8/21/19 4:32 AM, Yan, Zheng wrote: > Please enable debug mds (debug_mds=10), and try reproducing it again. we will get back with the logs on monday. thank you & with kind regards, t. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-snapshots causing mds failover, hangs
On Tue, Aug 20, 2019 at 9:43 PM thoralf schulze wrote: > > hi there, > > we are struggling with the creation of cephfs-snapshots: doing so > reproducible causes a failover of our metadata servers. afterwards, the > demoted mds servers won't be available as standby servers and the mds > daemons on these machines have to be manually restarted. more often than > we wish, the failover fails altogether, resulting in an unresponsive cephfs. > Please enable debug mds (debug_mds=10), and try reproducing it again. Regards Yan, Zheng > this is with mimic 13.2.6 and a single cephfs. we are running 4 mds > servers with plenty of cpu and ram ressources in a multi-active setup > with 2 active and 2 standby mds's: > > mds: ceph-fs-2/2/2 up > {0=juju-d0f708-3-lxd-1=up:active,1=juju-d0f708-5-lxd-1=up:active}, 2 > up:standby-replay > > is the transition from active to the standby mds servers intended? and > if not: how can we prevent those? > we could live with the failover if the ex-active mds's would still be a > part of the cluster afterwards, but this is not the case. on top of > that, the failover is not 100% reliable - if it fails, the newly active > mds's exhibit the same symptoms as the failed ones: they just sit around > complaining about "MDS internal heartbeat is not healthy!". > > strace'ing the mds processes on the ex-active mds shows that they are > mostly waiting for some futex'es to become available. we also found that > the issue gets alleviated a bit by rising mds_cache_memory_limit from > its default of 1gb to 32gb - in this case, the failover has a higher > chance to succeed. > > below are some logs from a successful failover - juju-d0f708-9-lxd-1 and > juju-d0f708-10-lxd-1 were the active mds's and juju-d0f708-3-lxd-1 and > juju-d0f708-5-lxd-1 the standbys. sorry for being very verbose, i don't > want to withhold any information that might be necessary to debug this > issue … if it helps, i can also provide the logs for > juju-d0f708-10-lxd-1 and juju-d0f708-5-lxd-1 and the output of ceph > daemon perf dump on all mds's before and after the issue occurs. > > thank you very much & with kind regards, > t. > > --- logs --- > > ceph-mgr.log on a mon: > > 2019-08-20 09:18:23.642 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614f2184000 mgrreport(mds.juju-d0f708-9-lxd-1 +0-0 packed > 1374) v6 from mds.0 172.28.9.20:6800/2237168008 > 2019-08-20 09:18:23.646 7f5684835700 1 mgr finish mon failed to return > metadata for mds.juju-d0f708-9-lxd-1: (22) Invalid argument > 2019-08-20 09:18:55.781 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614f256e700 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed > 1374) v6 from mds.1 172.28.9.21:6800/2008779502 > 2019-08-20 09:18:55.781 7f5684835700 1 mgr finish mon failed to return > metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument > 2019-08-20 09:21:26.562 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614f8780a00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) > v6 from mds.? 172.28.9.21:6800/1319885328 > 2019-08-20 09:21:26.562 7f5684835700 1 mgr finish mon failed to return > metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument > 2019-08-20 09:21:27.558 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614f5666a00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) > v6 from mds.? 172.28.9.21:6800/1319885328 > 2019-08-20 09:21:27.562 7f5684835700 1 mgr finish mon failed to return > metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument > 2019-08-20 09:21:28.558 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614f2e3ee00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) > v6 from mds.? 172.28.9.21:6800/1319885328 > 2019-08-20 09:21:28.562 7f5684835700 1 mgr finish mon failed to return > metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument > 2019-08-20 09:21:29.558 7f5679639700 0 ms_deliver_dispatch: unhandled > message 0x5614efa71880 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) > v6 from mds.? 172.28.9.21:6800/1319885328 > [… more of these] > > ceph-mds.log on juju-d0f708-9-lxd-1 (ex active rank 0): > > 2019-08-20 09:17:44.824 7f7d3a138700 5 mds.beacon.juju-d0f708-9-lxd-1 > Sending beacon up:active seq 59 > 2019-08-20 09:17:44.824 7f7d3fb2b700 5 mds.beacon.juju-d0f708-9-lxd-1 > received beacon reply up:active seq 59 rtt 0 > 2019-08-20 09:17:45.020 7f7d3db27700 4 mds.0.server > handle_client_request client_request(client.89193:898963 getattr > pAsLsXsFs #0x100221e 2019-08-20 09:17:45 > .021386 caller_uid=0, caller_gid=0{}) v2 > [… lots of these …] > 2019-08-20 09:17:45.092 7f7d36130700 5 mds.0.log _submit_thread > 109124009784~1190 : EUpdate cap update [metablob 0x14c, 1 dirs] > 2019-08-20 09:17:45.092 7f7d3db27700 4 mds.0.server > handle_client_request client_request(client.49306:901053 getattr > pAsLsXsFs #0x100221f 2019-08-20 09:17:45 > .093085 caller_uid=0, caller_gid=0{}) v2 > [… lots of these …] > 2019-08-20 09:17:45.260 7f7d36130700 5 mds
[ceph-users] cephfs-snapshots causing mds failover, hangs
hi there, we are struggling with the creation of cephfs-snapshots: doing so reproducible causes a failover of our metadata servers. afterwards, the demoted mds servers won't be available as standby servers and the mds daemons on these machines have to be manually restarted. more often than we wish, the failover fails altogether, resulting in an unresponsive cephfs. this is with mimic 13.2.6 and a single cephfs. we are running 4 mds servers with plenty of cpu and ram ressources in a multi-active setup with 2 active and 2 standby mds's: mds: ceph-fs-2/2/2 up {0=juju-d0f708-3-lxd-1=up:active,1=juju-d0f708-5-lxd-1=up:active}, 2 up:standby-replay is the transition from active to the standby mds servers intended? and if not: how can we prevent those? we could live with the failover if the ex-active mds's would still be a part of the cluster afterwards, but this is not the case. on top of that, the failover is not 100% reliable - if it fails, the newly active mds's exhibit the same symptoms as the failed ones: they just sit around complaining about "MDS internal heartbeat is not healthy!". strace'ing the mds processes on the ex-active mds shows that they are mostly waiting for some futex'es to become available. we also found that the issue gets alleviated a bit by rising mds_cache_memory_limit from its default of 1gb to 32gb - in this case, the failover has a higher chance to succeed. below are some logs from a successful failover - juju-d0f708-9-lxd-1 and juju-d0f708-10-lxd-1 were the active mds's and juju-d0f708-3-lxd-1 and juju-d0f708-5-lxd-1 the standbys. sorry for being very verbose, i don't want to withhold any information that might be necessary to debug this issue … if it helps, i can also provide the logs for juju-d0f708-10-lxd-1 and juju-d0f708-5-lxd-1 and the output of ceph daemon perf dump on all mds's before and after the issue occurs. thank you very much & with kind regards, t. --- logs --- ceph-mgr.log on a mon: 2019-08-20 09:18:23.642 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614f2184000 mgrreport(mds.juju-d0f708-9-lxd-1 +0-0 packed 1374) v6 from mds.0 172.28.9.20:6800/2237168008 2019-08-20 09:18:23.646 7f5684835700 1 mgr finish mon failed to return metadata for mds.juju-d0f708-9-lxd-1: (22) Invalid argument 2019-08-20 09:18:55.781 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614f256e700 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 1374) v6 from mds.1 172.28.9.21:6800/2008779502 2019-08-20 09:18:55.781 7f5684835700 1 mgr finish mon failed to return metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument 2019-08-20 09:21:26.562 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614f8780a00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) v6 from mds.? 172.28.9.21:6800/1319885328 2019-08-20 09:21:26.562 7f5684835700 1 mgr finish mon failed to return metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument 2019-08-20 09:21:27.558 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614f5666a00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) v6 from mds.? 172.28.9.21:6800/1319885328 2019-08-20 09:21:27.562 7f5684835700 1 mgr finish mon failed to return metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument 2019-08-20 09:21:28.558 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614f2e3ee00 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) v6 from mds.? 172.28.9.21:6800/1319885328 2019-08-20 09:21:28.562 7f5684835700 1 mgr finish mon failed to return metadata for mds.juju-d0f708-10-lxd-1: (22) Invalid argument 2019-08-20 09:21:29.558 7f5679639700 0 ms_deliver_dispatch: unhandled message 0x5614efa71880 mgrreport(mds.juju-d0f708-10-lxd-1 +0-0 packed 6) v6 from mds.? 172.28.9.21:6800/1319885328 [… more of these] ceph-mds.log on juju-d0f708-9-lxd-1 (ex active rank 0): 2019-08-20 09:17:44.824 7f7d3a138700 5 mds.beacon.juju-d0f708-9-lxd-1 Sending beacon up:active seq 59 2019-08-20 09:17:44.824 7f7d3fb2b700 5 mds.beacon.juju-d0f708-9-lxd-1 received beacon reply up:active seq 59 rtt 0 2019-08-20 09:17:45.020 7f7d3db27700 4 mds.0.server handle_client_request client_request(client.89193:898963 getattr pAsLsXsFs #0x100221e 2019-08-20 09:17:45 .021386 caller_uid=0, caller_gid=0{}) v2 [… lots of these …] 2019-08-20 09:17:45.092 7f7d36130700 5 mds.0.log _submit_thread 109124009784~1190 : EUpdate cap update [metablob 0x14c, 1 dirs] 2019-08-20 09:17:45.092 7f7d3db27700 4 mds.0.server handle_client_request client_request(client.49306:901053 getattr pAsLsXsFs #0x100221f 2019-08-20 09:17:45 .093085 caller_uid=0, caller_gid=0{}) v2 [… lots of these …] 2019-08-20 09:17:45.260 7f7d36130700 5 mds.0.log _submit_thread 109124010994~107 : ETableServer snaptable prepare reqid 2 mds.0 tid 98 version 98 mutation=43 bytes 2019-08-20 09:17:45.264 7f7d36130700 5 mds.0.log _submit_thread 109124011121~11940 : EUpdate mksnap [metablob 0x1, 1 dirs table_tids=^A,98] 2019-08-20 09:17:45.272 7f7d36130700 5 mds.0.log _submit_thread