Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
On 11/12/2017 15:13, Tobias Prousa wrote: Hi there, I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 active, 2 standby) and 28 OSDs in total. This cluster is up and running since the days of Bobtail (yes, including CephFS). Might consider shutting down 1 MON, since MONs need to be in odd numbers, And for you cluster 3 is more than sufficient. For reasons why, read either the Ceph docs, or search this maillinglist. Probably doesn;t help with your problem, but could you help prevent a split-brain situation in the future. --WjW ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
Hi Zheng, I think I managed to understand what you supposed me to do. The highest inode which was reported to be set erronously to be free was quite exactly identical with the highest inode in output of "cephfs-table tool all list inode". So I used take_ino as you supposed with an max_ino value slightly higher than that. Now MDS runs and I started a online MDS scrub. What confused me in the beginning was that inode numbers in list output is given in hex but when using take_ino it has to be specified in dec. Had to study the source code to get that... Tomorrow morning I'll see if things got stable again. Once again thank you very much for your support. I will report back to the ML when I got news. Best Regards, Tobi On 12/11/2017 05:19 PM, Tobias Prousa wrote: Hi Zheng, I did some more tests with cephfs-table-tool. I realized that disaster recovery implies to possibly reset inode table completely besides doing a session reset using something like cephfs-table-tool all reset inode Would that be close to what you suggested? Is it safe to reset complete inode table or will that wipe my file system? Btw. cephfs-table-tool show reset inode gives me ~400k inodes, part of them in section 'free', part of them in section 'projected_free'. Thanks, Tobi On 12/11/2017 04:28 PM, Yan, Zheng wrote: On Mon, Dec 11, 2017 at 11:17 PM, Tobias Prousawrote: These are essentially the first commands I did execute, in this exact order. Additionally I did a: ceph fs reset cephfs --yes-i-really-mean-it how many active mds were there before the upgrading. Any hint on how to find max inode number and do I understand that I should remove every free-marked inode number that is there except the biggest one which has to stay? If you are not sure, you can just try removing 1 inode numbers from inodetale How to remove those inodes using cephfs-table-tool? using cephfs-table-tool take_inos -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke Tel.: +49 (0)8142 / 50 13 60 Fax.: +49 (0)8142 / 50 13 69 eMail: tobias.pro...@caetec.de Web: http://www.caetec.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
Hi Zheng, I did some more tests with cephfs-table-tool. I realized that disaster recovery implies to possibly reset inode table completely besides doing a session reset using something like cephfs-table-tool all reset inode Would that be close to what you suggested? Is it safe to reset complete inode table or will that wipe my file system? Btw. cephfs-table-tool show reset inode gives me ~400k inodes, part of them in section 'free', part of them in section 'projected_free'. Thanks, Tobi On 12/11/2017 04:28 PM, Yan, Zheng wrote: On Mon, Dec 11, 2017 at 11:17 PM, Tobias Prousawrote: These are essentially the first commands I did execute, in this exact order. Additionally I did a: ceph fs reset cephfs --yes-i-really-mean-it how many active mds were there before the upgrading. Any hint on how to find max inode number and do I understand that I should remove every free-marked inode number that is there except the biggest one which has to stay? If you are not sure, you can just try removing 1 inode numbers from inodetale How to remove those inodes using cephfs-table-tool? using cephfs-table-tool take_inos -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke Tel.: +49 (0)8142 / 50 13 60 Fax.: +49 (0)8142 / 50 13 69 eMail: tobias.pro...@caetec.de Web: http://www.caetec.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
Hi Zheng, On 12/11/2017 04:28 PM, Yan, Zheng wrote: On Mon, Dec 11, 2017 at 11:17 PM, Tobias Prousawrote: These are essentially the first commands I did execute, in this exact order. Additionally I did a: ceph fs reset cephfs --yes-i-really-mean-it how many active mds were there before the upgrading. The CephFS in all the years never ever had more than a single active MDS. Might be that ceph fs reset was obsolete and actually killing all clients happend at the same time so this might not have been the change to get it "working" again. Any hint on how to find max inode number and do I understand that I should remove every free-marked inode number that is there except the biggest one which has to stay? If you are not sure, you can just try removing 1 inode numbers from inodetale I still do not get the meanings of all that inode removal. Wouldn't removing inodes drop files, i.e. data loss? And do those falsely free-marked inodes mean that if I start writing to my cephfs (in case I would get MDS working stable again) mean that it would write new data to inodes that are actually already in use, again coming with data loss? Would like to understand what I'm doing before I do it ;) How to remove those inodes using cephfs-table-tool? using cephfs-table-tool take_inos Is there some documentation to cephfs-table-tool? And which inodes would I want to remove? Again, I would like to understand whats happening. Thank you so much for taking your time. Your help is highly appreciated! Best regards, Tobi -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke Tel.: +49 (0)8142 / 50 13 60 Fax.: +49 (0)8142 / 50 13 69 eMail: tobias.pro...@caetec.de Web: http://www.caetec.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
On Mon, Dec 11, 2017 at 11:17 PM, Tobias Prousawrote: > > These are essentially the first commands I did execute, in this exact order. > Additionally I did a: > > ceph fs reset cephfs --yes-i-really-mean-it > how many active mds were there before the upgrading. > > Any hint on how to find max inode number and do I understand that I should > remove every free-marked inode number that is there except the biggest one > which has to stay? If you are not sure, you can just try removing 1 inode numbers from inodetale > > How to remove those inodes using cephfs-table-tool? > using cephfs-table-tool take_inos ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
On 12/11/2017 04:05 PM, Yan, Zheng wrote: On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousawrote: Hi there, I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 active, 2 standby) and 28 OSDs in total. This cluster is up and running since the days of Bobtail (yes, including CephFS). Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to restart MDSs they tried replaying journal then fell back to standby and FS was in state "damaged". I finally got them back working after I did a good portion of whats described here: http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ What commands did you run? you need to run following commands. cephfs-journal-tool event recover_dentries summary cephfs-journal-tool journal reset cephfs-table-tool all reset session These are essentially the first commands I did execute, in this exact order. Additionally I did a: ceph fs reset cephfs--yes-i-really-mean-it Which then was the moment when I was able to restart MDSs for the first time back on friday, IIRC. Now when all clients are shut down I can start MDS, will replay and become active. I then can mount CephFS on a client and can access my files and folders. But the more clients I bring up MDS will first report damaged metadata (probably due to some damaged paths, I could live with that) and then MDS will fail with assert: /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED assert(inode_map.count(in->vino()) == 0) I tried doing an online CephFS scrub like ceph daemon mds.a scrub_path / recursive repair This will run for couple of hours, always finding exactly 10001 damages of type "backtrace" and reporting it would be fixing loads of erronously free-marked inodes until MDS crashes. When I rerun that scrub after having killed all clients and restarted MDSs things will repeat finding exactly those 10001 damages and it will begin fixing those exactly same free-marked inodes over again. Find max inode number of these free-marked inodes, then use cephfs-table-tool to remove inode numbers that are smaller than the max number. you can remove a little more just in case. Before doing this, you should to stop mds and run "cephfs-table-tool all reset session". If everything goes right, mds will no longer trigger the assertion. Any hint on how to find max inode number and do I understand that I should remove every free-marked inode number that is there except the biggest one which has to stay? How to remove those inodes using cephfs-table-tool? Btw. CephFS has about 3 million objects in metadata pool. Data pool is about 30 million objects with ~2.5TB * 3 replicas. What I tried next is keeping MDS down and doing cephfs-data-scan scan_extents cephfs-data-scan scan_inodes cephfs-data-scan scan_links As this is described to take "a very long time" this is what I initially skipped from disater-recovery tips. Right now I'm still on first step with 6 workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s shows me client io of 20kB/s (!!!). If thats real scan speed this is going to take ages. Any way to tell how long this is going to take? Could I speed things up by running more workers on multiple hosts simultaneously? Should I abort it as I actually don't have the problem of lost files. Maybe running cephfs-data-scan scan_links would better suit my issue, or does scan_extents/scan_indoes HAVE to be run and finished first? I have to get this cluster up and running again as soon as possible. Any help highly appreciated. If there is anything I can help, e.g. with further information, feel free to ask. I'll try to hang around on #ceph (nick topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1). Thank you so much! Best regards, Tobi -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke Tel.: +49 (0)8142 / 50 13 60 Fax.: +49 (0)8142 / 50 13 69 eMail: tobias.pro...@caetec.de Web: http://www.caetec.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousawrote: > Hi there, > > I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home > to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 > active, 2 standby) and 28 OSDs in total. This cluster is up and running > since the days of Bobtail (yes, including CephFS). > > Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted > MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to > restart MDSs they tried replaying journal then fell back to standby and FS > was in state "damaged". I finally got them back working after I did a good > portion of whats described here: > > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ > > Now when all clients are shut down I can start MDS, will replay and become > active. I then can mount CephFS on a client and can access my files and > folders. But the more clients I bring up MDS will first report damaged > metadata (probably due to some damaged paths, I could live with that) and > then MDS will fail with assert: > > /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED > assert(inode_map.count(in->vino()) == 0) > > I tried doing an online CephFS scrub like > > ceph daemon mds.a scrub_path / recursive repair > > This will run for couple of hours, always finding exactly 10001 damages of > type "backtrace" and reporting it would be fixing loads of erronously > free-marked inodes until MDS crashes. When I rerun that scrub after having > killed all clients and restarted MDSs things will repeat finding exactly > those 10001 damages and it will begin fixing those exactly same free-marked > inodes over again. > > Btw. CephFS has about 3 million objects in metadata pool. Data pool is about > 30 million objects with ~2.5TB * 3 replicas. > > What I tried next is keeping MDS down and doing > > cephfs-data-scan scan_extents > cephfs-data-scan scan_inodes > cephfs-data-scan scan_links > > As this is described to take "a very long time" this is what I initially > skipped from disater-recovery tips. Right now I'm still on first step with 6 > workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s > shows me client io of 20kB/s (!!!). If thats real scan speed this is going > to take ages. > Any way to tell how long this is going to take? Could I speed things up by > running more workers on multiple hosts simultaneously? > Should I abort it as I actually don't have the problem of lost files. Maybe > running cephfs-data-scan scan_links would better suit my issue, or does > scan_extents/scan_indoes HAVE to be run and finished first? > you can interrupt scan_extents safely. > I have to get this cluster up and running again as soon as possible. Any > help highly appreciated. If there is anything I can help, e.g. with further > information, feel free to ask. I'll try to hang around on #ceph (nick > topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1). > > Thank you so much! > > Best regards, > Tobi > > -- > --- > Dipl.-Inf. (FH) Tobias Prousa > Leiter Entwicklung Datenlogger > > CAETEC GmbH > Industriestr. 1 > D-82140 Olching > www.caetec.de > > Gesellschaft mit beschränkter Haftung > Sitz der Gesellschaft: Olching > Handelsregister: Amtsgericht München, HRB 183929 > Geschäftsführung: Stephan Bacher, Andreas Wocke > > Tel.: +49 (0)8142 / 50 13 60 > Fax.: +49 (0)8142 / 50 13 69 > > eMail: tobias.pro...@caetec.de > Web: http://www.caetec.de > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
On Mon, Dec 11, 2017 at 10:13 PM, Tobias Prousawrote: > Hi there, > > I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home > to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 > active, 2 standby) and 28 OSDs in total. This cluster is up and running > since the days of Bobtail (yes, including CephFS). > > Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted > MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to > restart MDSs they tried replaying journal then fell back to standby and FS > was in state "damaged". I finally got them back working after I did a good > portion of whats described here: > > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ What commands did you run? you need to run following commands. cephfs-journal-tool event recover_dentries summary cephfs-journal-tool journal reset cephfs-table-tool all reset session > > Now when all clients are shut down I can start MDS, will replay and become > active. I then can mount CephFS on a client and can access my files and > folders. But the more clients I bring up MDS will first report damaged > metadata (probably due to some damaged paths, I could live with that) and > then MDS will fail with assert: > > /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED > assert(inode_map.count(in->vino()) == 0) > > I tried doing an online CephFS scrub like > > ceph daemon mds.a scrub_path / recursive repair > > This will run for couple of hours, always finding exactly 10001 damages of > type "backtrace" and reporting it would be fixing loads of erronously > free-marked inodes until MDS crashes. When I rerun that scrub after having > killed all clients and restarted MDSs things will repeat finding exactly > those 10001 damages and it will begin fixing those exactly same free-marked > inodes over again. Find max inode number of these free-marked inodes, then use cephfs-table-tool to remove inode numbers that are smaller than the max number. you can remove a little more just in case. Before doing this, you should to stop mds and run "cephfs-table-tool all reset session". If everything goes right, mds will no longer trigger the assertion. > > Btw. CephFS has about 3 million objects in metadata pool. Data pool is about > 30 million objects with ~2.5TB * 3 replicas. > > What I tried next is keeping MDS down and doing > > cephfs-data-scan scan_extents > cephfs-data-scan scan_inodes > cephfs-data-scan scan_links > > As this is described to take "a very long time" this is what I initially > skipped from disater-recovery tips. Right now I'm still on first step with 6 > workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s > shows me client io of 20kB/s (!!!). If thats real scan speed this is going > to take ages. > Any way to tell how long this is going to take? Could I speed things up by > running more workers on multiple hosts simultaneously? > Should I abort it as I actually don't have the problem of lost files. Maybe > running cephfs-data-scan scan_links would better suit my issue, or does > scan_extents/scan_indoes HAVE to be run and finished first? > > I have to get this cluster up and running again as soon as possible. Any > help highly appreciated. If there is anything I can help, e.g. with further > information, feel free to ask. I'll try to hang around on #ceph (nick > topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1). > > Thank you so much! > > Best regards, > Tobi > > -- > --- > Dipl.-Inf. (FH) Tobias Prousa > Leiter Entwicklung Datenlogger > > CAETEC GmbH > Industriestr. 1 > D-82140 Olching > www.caetec.de > > Gesellschaft mit beschränkter Haftung > Sitz der Gesellschaft: Olching > Handelsregister: Amtsgericht München, HRB 183929 > Geschäftsführung: Stephan Bacher, Andreas Wocke > > Tel.: +49 (0)8142 / 50 13 60 > Fax.: +49 (0)8142 / 50 13 69 > > eMail: tobias.pro...@caetec.de > Web: http://www.caetec.de > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade from 12.2.1 to 12.2.2 broke my CephFs
Hi there, I'm running a CEPH cluster for some libvirt VMs and a CephFS providing /home to ~20 desktop machines. There are 4 Hosts running 4 MONs, 4MGRs, 3MDSs (1 active, 2 standby) and 28 OSDs in total. This cluster is up and running since the days of Bobtail (yes, including CephFS). Now with update from 12.2.1 to 12.2.2 on last friday afternoon I restarted MONs, MGRs, OSDs as usual. RBD is running just fine. But after trying to restart MDSs they tried replaying journal then fell back to standby and FS was in state "damaged". I finally got them back working after I did a good portion of whats described here: http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ Now when all clients are shut down I can start MDS, will replay and become active. I then can mount CephFS on a client and can access my files and folders. But the more clients I bring up MDS will first report damaged metadata (probably due to some damaged paths, I could live with that) and then MDS will fail with assert: /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED assert(inode_map.count(in->vino()) == 0) I tried doing an online CephFS scrub like ceph daemon mds.a scrub_path / recursive repair This will run for couple of hours, always finding exactly 10001 damages of type "backtrace" and reporting it would be fixing loads of erronously free-marked inodes until MDS crashes. When I rerun that scrub after having killed all clients and restarted MDSs things will repeat finding exactly those 10001 damages and it will begin fixing those exactly same free-marked inodes over again. Btw. CephFS has about 3 million objects in metadata pool. Data pool is about 30 million objects with ~2.5TB * 3 replicas. What I tried next is keeping MDS down and doing cephfs-data-scan scan_extents cephfs-data-scan scan_inodes cephfs-data-scan scan_links As this is described to take "a very long time" this is what I initially skipped from disater-recovery tips. Right now I'm still on first step with 6 workers on a single host busy doing cephfs-data-scan scan_extents. ceph -s shows me client io of 20kB/s (!!!). If thats real scan speed this is going to take ages. Any way to tell how long this is going to take? Could I speed things up by running more workers on multiple hosts simultaneously? Should I abort it as I actually don't have the problem of lost files. Maybe running cephfs-data-scan scan_links would better suit my issue, or does scan_extents/scan_indoes HAVE to be run and finished first? I have to get this cluster up and running again as soon as possible. Any help highly appreciated. If there is anything I can help, e.g. with further information, feel free to ask. I'll try to hang around on #ceph (nick topro/topro_/topro__). FYI, I'm in Central Europe TimeZone (UTC+1). Thank you so much! Best regards, Tobi -- --- Dipl.-Inf. (FH) Tobias Prousa Leiter Entwicklung Datenlogger CAETEC GmbH Industriestr. 1 D-82140 Olching www.caetec.de Gesellschaft mit beschränkter Haftung Sitz der Gesellschaft: Olching Handelsregister: Amtsgericht München, HRB 183929 Geschäftsführung: Stephan Bacher, Andreas Wocke Tel.: +49 (0)8142 / 50 13 60 Fax.: +49 (0)8142 / 50 13 69 eMail: tobias.pro...@caetec.de Web: http://www.caetec.de ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com