Re: [ceph-users] Problematic inode preventing ceph-mds from starting
Hi! Yes, resetting journals is exactly what we did, quite a while ago, when the mds ran out of memory because a journal entry had an absurdly large number in it (I think it may have been an inode number). We probably also reset the inode table later, which I recently learned resets a data structure on disk, and probably started us overwriting inodes or dentries or both. So I take it (we are learning about filesystems very quickly over here) that ceph is reusing inode numbers. Re-scanning dentries will somehow figure out which dentry is most recent, and remove the older (now wrong) one. And somehow it can handle hard links, possibly (we don't have many, or any, of these). Thanks very much for your help. This has been fascinating. Neale From: Patrick Donnelly Sent: Monday, October 28, 2019 12:52 To: Pickett, Neale T Cc: ceph-users Subject: Re: [ceph-users] Problematic inode preventing ceph-mds from starting On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T wrote: > In the last week we have made a few changes to the down filesystem in an > attempt to fix what we thought was an inode problem: > > > cephfs-data-scan scan_extents # about 1 day with 64 processes > > cephfs-data-scan scan_inodes # about 1 day with 64 processes > > cephfs-data_scan scan_links # about 1 day Did you reset the journals or perform any other disaster recovery commands? This process likely introduced the duplicate inodes. > After these three, we tried to start an MDS and it stayed up. We then ran: > > ceph tell mds.a scrub start / recursive repair > > > The repair ran about 3 days, spewing logs to `ceph -w` about duplicated > inodes, until it stopped. All looked well until we began bringing production > services back online, at which point many error messages appeared, the mds > went back into damaged, and the fs back to degraded. At this point I removed > the objects you suggested, which brought everything back briefly. > > The latest crash is: > > -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: > In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time > 2019-1... > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: > 258: FAILED ceph_assert(!p) This error indicates a duplicate inode loaded into cache. Fixing this probably requires significant intervention and (meta)data loss for recent changes: - Stop/unmount all clients. (Probably already the case if the rank is damaged!) - Reset the MDS journal [1] and optionally recover any dentries first. (This will hopefully resolve the ESubtreeMap errors you pasted.) Note that some metadata may be lost through this command. - `cephfs-data_scan scan_links` again. This should repair any duplicate inodes (by dropping the older dentries). - Then you can try marking the rank as repaired. Good luck! [1] https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problematic inode preventing ceph-mds from starting
On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T wrote: > In the last week we have made a few changes to the down filesystem in an > attempt to fix what we thought was an inode problem: > > > cephfs-data-scan scan_extents # about 1 day with 64 processes > > cephfs-data-scan scan_inodes # about 1 day with 64 processes > > cephfs-data_scan scan_links # about 1 day Did you reset the journals or perform any other disaster recovery commands? This process likely introduced the duplicate inodes. > After these three, we tried to start an MDS and it stayed up. We then ran: > > ceph tell mds.a scrub start / recursive repair > > > The repair ran about 3 days, spewing logs to `ceph -w` about duplicated > inodes, until it stopped. All looked well until we began bringing production > services back online, at which point many error messages appeared, the mds > went back into damaged, and the fs back to degraded. At this point I removed > the objects you suggested, which brought everything back briefly. > > The latest crash is: > > -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: > In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time > 2019-1... > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: > 258: FAILED ceph_assert(!p) This error indicates a duplicate inode loaded into cache. Fixing this probably requires significant intervention and (meta)data loss for recent changes: - Stop/unmount all clients. (Probably already the case if the rank is damaged!) - Reset the MDS journal [1] and optionally recover any dentries first. (This will hopefully resolve the ESubtreeMap errors you pasted.) Note that some metadata may be lost through this command. - `cephfs-data_scan scan_links` again. This should repair any duplicate inodes (by dropping the older dentries). - Then you can try marking the rank as repaired. Good luck! [1] https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problematic inode preventing ceph-mds from starting
gga [ERR] replayed ESubtreeMap at 30403641312085 subtree root 0x1 is not mine in cache (it's -2,-2) 2019-10-25 18:48:05.728678 mds.frigga [ERR] replayed ESubtreeMap at 30403641478175 subtree root 0x1 is not mine in cache (it's -2,-2) 2019-10-25 18:48:05.734157 mds.frigga [ERR] failure replaying journal (EMetaBlob) 2019-10-25 18:48:06.676985 mon.coffee [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN) 2019-10-25 18:48:06.677063 mon.coffee [ERR] Health check failed: 1 mds daemon damaged (MDS_DAMAGE) We now show mds: cephfs:0/1 5 up:standby, 1 damaged (we only have 5 mds servers) This still smells like an inode problem to me, but I have completely run out of ideas, so I will do nothing more to ceph as I anxoiusly hope I am not fired for this 14-days-and-counting outage while awaiting a reply from the list. Thank you very much! Neale From: Patrick Donnelly Sent: Thursday, October 24, 2019 17:57 To: Pickett, Neale T Cc: ceph-users Subject: Re: [ceph-users] Problematic inode preventing ceph-mds from starting Hi Neale, On Fri, Oct 18, 2019 at 9:31 AM Pickett, Neale T wrote: > > Last week I asked about a rogue inode that was causing ceph-mds to segfault > during replay. We didn't get any suggestions from this list, so we have been > familiarizing ourselves with the ceph source code, and have added the > following patch: > > > > --- a/src/mds/CInode.cc > +++ b/src/mds/CInode.cc > @@ -736,6 +736,13 @@ CDir *CInode::get_approx_dirfrag(frag_t fg) > > CDir *CInode::get_or_open_dirfrag(MDCache *mdcache, frag_t fg) > { > + if (!is_dir()) { > +ostringstream oss; > +JSONFormatter f(true); f.open_object_section("inode"); (otherwise the output is hard to read) > +dump(, DUMP_PATH | DUMP_INODE_STORE_BASE | DUMP_MDS_CACHE_OBJECT | > DUMP_LOCKS | DUMP_STATE | DUMP_CAPS | DUMP_DIRFRAGS); f.close_object_section("inode") > +f.flush(oss); > +dout(0) << oss.str() << dendl; > + } >ceph_assert(is_dir()); > >// have it? This seems like a generally useful patch. Feel free to submit a PR. > > This has given us a culprit: > > > > -2> 2019-10-18 16:19:06.934 7faefa470700 0 > mds.0.cache.ino(0x1995e63) > "/unimportant/path/we/can/tolerate/losing/compat.py"10995216789470"2018-03-24 > 03:18:17.621969""2018-03-24 03:18:17.620969"3318855521001{ > "dir_hash": 0 > } > { > "stripe_unit": 4194304, > "stripe_count": 1, > "object_size": 4194304, > "pool_id": 1, > "pool_ns": "" > } > [] > 3411844674407370955161500"2015-01-27 16:01:52.467669""2018-03-24 > 03:18:17.621969"21-1[] > { > "version": 0, > "mtime": "0.00", > "num_files": 0, > "num_subdirs": 0 > } > { > "version": 0, > "rbytes": 34, > "rfiles": 1, > "rsubdirs": 0, > "rsnaps": 0, > "rctime": "0.00" > } > { > "version": 0, > "rbytes": 34, > "rfiles": 1, > "rsubdirs": 0, > "rsnaps": 0, > "rctime": "0.00" > } > 2540123""""[] > { > "splits": [] > } > true{ > "replicas": {} > } > { > "authority": [ > 0, > -2 > ], > "replica_nonce": 0 > } > 0falsefalse{} > 0{ > "gather_set": [], > "state": "lock", > "is_leased": false, > "num_rdlocks": 0, > "num_wrlocks": 0, > "num_xlocks": 0, > "xlock_by": {} > } > {} > {} > {} > {} > {} > {} > {} > {} > {} > [ > "auth" > ] > [] > -1-1[] > [] > > -1> 2019-10-18 16:19:06.964 7faefa470700 -1 > /opt/app-root/src/ceph/src/mds/CInode.cc: In function 'CDir* > CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7faefa470700 time > 2019-10-18 16:19:06.934662 > /opt/app-root/src/ceph/src/mds/CInode.cc: 746: FAILED ceph_assert(is_dir()) > > ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1aa) [0x7faf0a9ce39e] > 2: (()+0x12a8620) [0x7faf0a9ce620] > 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x253) [0x557562a4b1ad] > 4: (OpenFileTable::_prefetch_dirfrags()+0x4db) [0x557562b63d63] > 5: (OpenFileTabl
Re: [ceph-users] Problematic inode preventing ceph-mds from starting
Hi Neale, On Fri, Oct 18, 2019 at 9:31 AM Pickett, Neale T wrote: > > Last week I asked about a rogue inode that was causing ceph-mds to segfault > during replay. We didn't get any suggestions from this list, so we have been > familiarizing ourselves with the ceph source code, and have added the > following patch: > > > > --- a/src/mds/CInode.cc > +++ b/src/mds/CInode.cc > @@ -736,6 +736,13 @@ CDir *CInode::get_approx_dirfrag(frag_t fg) > > CDir *CInode::get_or_open_dirfrag(MDCache *mdcache, frag_t fg) > { > + if (!is_dir()) { > +ostringstream oss; > +JSONFormatter f(true); f.open_object_section("inode"); (otherwise the output is hard to read) > +dump(, DUMP_PATH | DUMP_INODE_STORE_BASE | DUMP_MDS_CACHE_OBJECT | > DUMP_LOCKS | DUMP_STATE | DUMP_CAPS | DUMP_DIRFRAGS); f.close_object_section("inode") > +f.flush(oss); > +dout(0) << oss.str() << dendl; > + } >ceph_assert(is_dir()); > >// have it? This seems like a generally useful patch. Feel free to submit a PR. > > This has given us a culprit: > > > > -2> 2019-10-18 16:19:06.934 7faefa470700 0 > mds.0.cache.ino(0x1995e63) > "/unimportant/path/we/can/tolerate/losing/compat.py"10995216789470"2018-03-24 > 03:18:17.621969""2018-03-24 03:18:17.620969"3318855521001{ > "dir_hash": 0 > } > { > "stripe_unit": 4194304, > "stripe_count": 1, > "object_size": 4194304, > "pool_id": 1, > "pool_ns": "" > } > [] > 3411844674407370955161500"2015-01-27 16:01:52.467669""2018-03-24 > 03:18:17.621969"21-1[] > { > "version": 0, > "mtime": "0.00", > "num_files": 0, > "num_subdirs": 0 > } > { > "version": 0, > "rbytes": 34, > "rfiles": 1, > "rsubdirs": 0, > "rsnaps": 0, > "rctime": "0.00" > } > { > "version": 0, > "rbytes": 34, > "rfiles": 1, > "rsubdirs": 0, > "rsnaps": 0, > "rctime": "0.00" > } > 2540123[] > { > "splits": [] > } > true{ > "replicas": {} > } > { > "authority": [ > 0, > -2 > ], > "replica_nonce": 0 > } > 0falsefalse{} > 0{ > "gather_set": [], > "state": "lock", > "is_leased": false, > "num_rdlocks": 0, > "num_wrlocks": 0, > "num_xlocks": 0, > "xlock_by": {} > } > {} > {} > {} > {} > {} > {} > {} > {} > {} > [ > "auth" > ] > [] > -1-1[] > [] > > -1> 2019-10-18 16:19:06.964 7faefa470700 -1 > /opt/app-root/src/ceph/src/mds/CInode.cc: In function 'CDir* > CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7faefa470700 time > 2019-10-18 16:19:06.934662 > /opt/app-root/src/ceph/src/mds/CInode.cc: 746: FAILED ceph_assert(is_dir()) > > ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1aa) [0x7faf0a9ce39e] > 2: (()+0x12a8620) [0x7faf0a9ce620] > 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x253) [0x557562a4b1ad] > 4: (OpenFileTable::_prefetch_dirfrags()+0x4db) [0x557562b63d63] > 5: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x16a) [0x557562b63720] > 6: (C_OFT_OpenInoFinish::finish(int)+0x2d) [0x557562b67699] > 7: (Context::complete(int)+0x27) [0x557562657fbf] > 8: (MDSContext::complete(int)+0x152) [0x557562b04aa4] > 9: (void finish_contexts std::allocator > >(CephContext*, std::vector std::allocator >&, int)+0x2c8) [0x557562660e36] > 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, > int)+0x185) [0x557562844c4d] > 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, > ceph::buffer::v14_2_0::list&, int)+0xbbf) [0x557562842785] > 12: (C_IO_MDC_OpenInoBacktraceFetched::finish(int)+0x37) [0x557562886a31] > 13: (Context::complete(int)+0x27) [0x557562657fbf] > 14: (MDSContext::complete(int)+0x152) [0x557562b04aa4] > 15: (MDSIOContextBase::complete(int)+0x345) [0x557562b0522d] > 16: (Finisher::finisher_thread_entry()+0x38b) [0x7faf0a9033e1] > 17: (Finisher::FinisherThread::entry()+0x1c) [0x5575626a2772] > 18: (Thread::entry_wrapper()+0x78) [0x7faf0a97203c] > 19: (Thread::_entry_func(void*)+0x18) [0x7faf0a971fba] > 20: (()+0x7dd5) [0x7faf07844dd5] > 21: (clone()+0x6d) [0x7faf064f502d] > > I tried removing it, but it does not show up in the omapkeys for that inode: > > lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys > 1995e63. > __about__.py_head > __init__.py_head > __pycache___head > _compat.py_head > _structures.py_head > markers.py_head > requirements.py_head > specifiers.py_head > utils.py_head > version.py_head > lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey > 1995e63. _compat.py_head > lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey > 1995e63. compat.py_head > lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey > 1995e63. file-does-not-exist_head > lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys > 1995e63. > __about__.py_head >
[ceph-users] Problematic inode preventing ceph-mds from starting
Last week I asked about a rogue inode that was causing ceph-mds to segfault during replay. We didn't get any suggestions from this list, so we have been familiarizing ourselves with the ceph source code, and have added the following patch: --- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -736,6 +736,13 @@ CDir *CInode::get_approx_dirfrag(frag_t fg) CDir *CInode::get_or_open_dirfrag(MDCache *mdcache, frag_t fg) { + if (!is_dir()) { +ostringstream oss; +JSONFormatter f(true); +dump(, DUMP_PATH | DUMP_INODE_STORE_BASE | DUMP_MDS_CACHE_OBJECT | DUMP_LOCKS | DUMP_STATE | DUMP_CAPS | DUMP_DIRFRAGS); +f.flush(oss); +dout(0) << oss.str() << dendl; + } ceph_assert(is_dir()); // have it? This has given us a culprit: -2> 2019-10-18 16:19:06.934 7faefa470700 0 mds.0.cache.ino(0x1995e63) "/unimportant/path/we/can/tolerate/losing/compat.py"10995216789470"2018-03-24 03:18:17.621969""2018-03-24 03:18:17.620969"3318855521001{ "dir_hash": 0 } { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 1, "pool_ns": "" } [] 3411844674407370955161500"2015-01-27 16:01:52.467669""2018-03-24 03:18:17.621969"21-1[] { "version": 0, "mtime": "0.00", "num_files": 0, "num_subdirs": 0 } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.00" } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.00" } 2540123[] { "splits": [] } true{ "replicas": {} } { "authority": [ 0, -2 ], "replica_nonce": 0 } 0falsefalse{} 0{ "gather_set": [], "state": "lock", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 0, "xlock_by": {} } {} {} {} {} {} {} {} {} {} [ "auth" ] [] -1-1[] [] -1> 2019-10-18 16:19:06.964 7faefa470700 -1 /opt/app-root/src/ceph/src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7faefa470700 time 2019-10-18 16:19:06.934662 /opt/app-root/src/ceph/src/mds/CInode.cc: 746: FAILED ceph_assert(is_dir()) ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1aa) [0x7faf0a9ce39e] 2: (()+0x12a8620) [0x7faf0a9ce620] 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x253) [0x557562a4b1ad] 4: (OpenFileTable::_prefetch_dirfrags()+0x4db) [0x557562b63d63] 5: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x16a) [0x557562b63720] 6: (C_OFT_OpenInoFinish::finish(int)+0x2d) [0x557562b67699] 7: (Context::complete(int)+0x27) [0x557562657fbf] 8: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 9: (void finish_contexts > >(CephContext*, std::vector >&, int)+0x2c8) [0x557562660e36] 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x185) [0x557562844c4d] 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v14_2_0::list&, int)+0xbbf) [0x557562842785] 12: (C_IO_MDC_OpenInoBacktraceFetched::finish(int)+0x37) [0x557562886a31] 13: (Context::complete(int)+0x27) [0x557562657fbf] 14: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 15: (MDSIOContextBase::complete(int)+0x345) [0x557562b0522d] 16: (Finisher::finisher_thread_entry()+0x38b) [0x7faf0a9033e1] 17: (Finisher::FinisherThread::entry()+0x1c) [0x5575626a2772] 18: (Thread::entry_wrapper()+0x78) [0x7faf0a97203c] 19: (Thread::_entry_func(void*)+0x18) [0x7faf0a971fba] 20: (()+0x7dd5) [0x7faf07844dd5] 21: (clone()+0x6d) [0x7faf064f502d] I tried removing it, but it does not show up in the omapkeys for that inode: lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 1995e63. __about__.py_head __init__.py_head __pycache___head _compat.py_head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. _compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. file-does-not-exist_head lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 1995e63. __about__.py_head __init__.py_head __pycache___head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head Predictably, this did nothing to solve our problem, and ceph-mds is still dying during startup. Any suggestions? Neale Pickett A-4: Advanced Research in Cyber Systems Los Alamos National Laboratory ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com