Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Gustavo Tonini
Well, I coundn't identify which object I need to "rmomapkey" as instructed in https://tracker.ceph.com/issues/38452#note-12. This is the log around the crash: https://pastebin.com/muw34Qdc On Fri, Oct 25, 2019 at 11:27 AM Yan, Zheng wrote: > On Fri, Oct 25, 2019 at 9:42 PM Gustavo Tonini >

Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Yan, Zheng
On Fri, Oct 25, 2019 at 9:42 PM Gustavo Tonini wrote: > > Running "cephfs-data-scan init --force-init" solved the problem. > > Then I had to run "cephfs-journal-tool event recover_dentries summary" and > truncate the journal to fix the corrupted journal. > > CephFS worked well for approximately

Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Gustavo Tonini
Running "cephfs-data-scan init --force-init" solved the problem. Then I had to run "cephfs-journal-tool event recover_dentries summary" and truncate the journal to fix the corrupted journal. CephFS worked well for approximately 3 hours and then our MDS crashed again, apparently due to the bug

Re: [ceph-users] Crashed MDS (segfault)

2019-10-22 Thread Yan, Zheng
On Tue, Oct 22, 2019 at 1:49 AM Gustavo Tonini wrote: > > Is there a possibility to lose data if I use "cephfs-data-scan init > --force-init"? > It only causes incorrect stat on root inode, can't cause data lose. running 'ceph daemon mds.a scrub_path / force repair' after mds restart can fix

Re: [ceph-users] Crashed MDS (segfault)

2019-10-21 Thread Gustavo Tonini
Is there a possibility to lose data if I use "cephfs-data-scan init --force-init"? On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng wrote: > On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini > wrote: > > > > Hi Zheng, > > the cluster is running ceph mimic. This warning about network only > appears when

Re: [ceph-users] Crashed MDS (segfault)

2019-10-21 Thread Yan, Zheng
On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini wrote: > > Hi Zheng, > the cluster is running ceph mimic. This warning about network only appears > when using nautilus' cephfs-journal-tool. > > "cephfs-data-scan scan_links" does not report any issue. > > How could variable "newparent" be NULL at

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Gustavo Tonini
Hi Zheng, the cluster is running ceph mimic. This warning about network only appears when using nautilus' cephfs-journal-tool. "cephfs-data-scan scan_links" does not report any issue. How could variable "newparent" be NULL at https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ?

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
On Thu, Oct 17, 2019 at 10:19 PM Gustavo Tonini wrote: > > No. The cluster was just rebalancing. > > The journal seems damaged: > > ceph@deployer:~$ cephfs-journal-tool --rank=fs_padrao:0 journal inspect > 2019-10-16 17:46:29.596 7fcd34cbf700 -1 NetHandler create_socket couldn't > create socket

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini wrote: > > Dear ceph users, > we're experiencing a segfault during MDS startup (replay process) which is > making our FS inaccessible. > > MDS log messages: > > Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201 > 7f3c08f49700

[ceph-users] Crashed MDS (segfault)

2019-10-14 Thread Gustavo Tonini
Dear ceph users, we're experiencing a segfault during MDS startup (replay process) which is making our FS inaccessible. MDS log messages: Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201 7f3c08f49700 1 -- 192.168.8.195:6800/3181891717 <== osd.26 192.168.8.209:6821/2419345 3