Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Hmm looks like I restarted everything except MDS... So it's the same issue! That's why the MDS kill themselves during the reboot of one of the monitors with MDS in 12.2.2. Thanks Dan! Adrien Le 28/03/2018 à 16:43, Dan van der Ster a écrit : Do you have the startup banners for mds.cccephadm14 and 15? It sure looks like they were running 12.2.2 with the "not writeable with daemon features" error. -- dan On Wed, Mar 28, 2018 at 3:12 PM, adrien.geor...@cc.in2p3.fr wrote: Hi, All Ceph services were in 12.2.4 version. Adrien Le 28/03/2018 à 14:47, Dan van der Ster a écrit : Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: Hi, I just had the same issue with our 12.2.4 cluster but not during the upgrade. One of our 3 monitors restarted (the one with a standby MDS) and the 2 others active MDS killed themselves : 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. wanted state up:active 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting down rank 1 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. wanted state up:active 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting down rank 0 I had to restart manually the MDS services to get it works. Adrien Le 21/03/2018 à 11:37, Martin Palma a écrit : Just run into this problem on our production cluster It would have been nice if the release notes of 12.2.4 had been adapted to inform user about this. Best, Martin On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum wrote: On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree wrote: On 2018-03-14T06:57:08, Patrick Donnelly wrote: Yes. But the real outcome is not "no MDS [is] active" but "some or all metadata I/O will pause" -- and there is no avoiding that. During an MDS upgrade, a standby must take over the MDS being shutdown (and upgraded). During takeover, metadata I/O will briefly pause as the rank is unavailable. (Specifically, no other rank can obtains locks or communicate with the "failed" rank; so metadata I/O will necessarily pause until a standby takes over.) Single active vs. multiple active upgrade makes little difference in this outcome. Fair, except that there's no standby MDS at this time in case the update goes wrong. Is another approach theoretically feasible? Have the updated MDS only go into the incompatible mode once there's a quorum of new ones available, or something? I believe so, yes. That option wasn't explored for this patch because it was just disambiguating the compatibility flags and the full side-effects weren't realized. Would such a patch be accepted if we ended up pursuing this? Any suggestions on how to best go about this? It'd be ugly, but you'd have to set it up so that * new MDSes advertise the old set of required values * but can identify when all the MDSes are new * then mark somewhere that they can use the correct values * then switch to the proper requirements I don't remember the details of this CompatSet code any more, and it's definitely made trickier by the MDS having no permanent local state. Since we do luckily have both the IDs and the strings, you might be able to do something in the MDSMonitor to identify whether booting MDSes have "too-old", "old-featureset-but-support-new-feature", or "new, correct feature advertising" and then either massage that incoming message down to the "old-featureset-but-support-new-feature" (if not all the MDSes are new) or do an auto-upgrade of the required features in the map. And you might also need compatibility code in the MDS to make sure it sends out the appropriate bits on connection, but I *think* the CompatSet checks are only done o
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Do you have the startup banners for mds.cccephadm14 and 15? It sure looks like they were running 12.2.2 with the "not writeable with daemon features" error. -- dan On Wed, Mar 28, 2018 at 3:12 PM, adrien.geor...@cc.in2p3.fr wrote: > Hi, > > All Ceph services were in 12.2.4 version. > > Adrien > > > Le 28/03/2018 à 14:47, Dan van der Ster a écrit : >> >> Hi, >> >> Which versions were those MDS's before and after the restarted standby >> MDS? >> >> Cheers, Dan >> >> >> >> On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr >> wrote: >>> >>> Hi, >>> >>> I just had the same issue with our 12.2.4 cluster but not during the >>> upgrade. >>> One of our 3 monitors restarted (the one with a standby MDS) and the 2 >>> others active MDS killed themselves : >>> >>> 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map >>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client >>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no >>> anchor >>> table,9=file layout v2} not writeable with daemon features >>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>> ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds >>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline >>> data,8=file layout v2}, killing myself >>> 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. >>> wanted >>> state up:active >>> 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting >>> down >>> rank 1 >>> >>> >>> 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map >>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client >>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no >>> anchor >>> table,9=file layout v2} not writeable with daemon features >>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>> ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds >>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline >>> data,8=file layout v2}, killing myself >>> 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. >>> wanted >>> state up:active >>> 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting >>> down >>> rank 0 >>> >>> I had to restart manually the MDS services to get it works. >>> >>> Adrien >>> >>> >>> Le 21/03/2018 à 11:37, Martin Palma a écrit : Just run into this problem on our production cluster It would have been nice if the release notes of 12.2.4 had been adapted to inform user about this. Best, Martin On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum wrote: > > On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree > wrote: >> >> On 2018-03-14T06:57:08, Patrick Donnelly wrote: >> >>> Yes. But the real outcome is not "no MDS [is] active" but "some or >>> all >>> metadata I/O will pause" -- and there is no avoiding that. During an >>> MDS upgrade, a standby must take over the MDS being shutdown (and >>> upgraded). During takeover, metadata I/O will briefly pause as the >>> rank is unavailable. (Specifically, no other rank can obtains locks >>> or >>> communicate with the "failed" rank; so metadata I/O will necessarily >>> pause until a standby takes over.) Single active vs. multiple active >>> upgrade makes little difference in this outcome. >> >> Fair, except that there's no standby MDS at this time in case the >> update >> goes wrong. >> Is another approach theoretically feasible? Have the updated MDS only go into the incompatible mode once there's a quorum of new ones available, or something? >>> >>> I believe so, yes. That option wasn't explored for this patch because >>> it was just disambiguating the compatibility flags and the full >>> side-effects weren't realized. >> >> Would such a patch be accepted if we ended up pursuing this? Any >> suggestions on how to best go about this? > > It'd be ugly, but you'd have to set it up so that > * new MDSes advertise the old set of required values > * but can identify when all the MDSes are new > * then mark somewhere that they can use the correct values > * then switch to the proper requirements > > I don't remember the details of this CompatSet code any more, and it's > definitely made trickier by the MDS having no permanent local state. > Since we do luckily have both the IDs and the strings, you might be > able to do something in the MDSMonitor to identify whether booting > MDSes have "too-old", "old-featureset-but-support-new-feature", or > "new, correct feature a
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Hi, All Ceph services were in 12.2.4 version. Adrien Le 28/03/2018 à 14:47, Dan van der Ster a écrit : Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: Hi, I just had the same issue with our 12.2.4 cluster but not during the upgrade. One of our 3 monitors restarted (the one with a standby MDS) and the 2 others active MDS killed themselves : 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. wanted state up:active 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting down rank 1 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. wanted state up:active 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting down rank 0 I had to restart manually the MDS services to get it works. Adrien Le 21/03/2018 à 11:37, Martin Palma a écrit : Just run into this problem on our production cluster It would have been nice if the release notes of 12.2.4 had been adapted to inform user about this. Best, Martin On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum wrote: On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree wrote: On 2018-03-14T06:57:08, Patrick Donnelly wrote: Yes. But the real outcome is not "no MDS [is] active" but "some or all metadata I/O will pause" -- and there is no avoiding that. During an MDS upgrade, a standby must take over the MDS being shutdown (and upgraded). During takeover, metadata I/O will briefly pause as the rank is unavailable. (Specifically, no other rank can obtains locks or communicate with the "failed" rank; so metadata I/O will necessarily pause until a standby takes over.) Single active vs. multiple active upgrade makes little difference in this outcome. Fair, except that there's no standby MDS at this time in case the update goes wrong. Is another approach theoretically feasible? Have the updated MDS only go into the incompatible mode once there's a quorum of new ones available, or something? I believe so, yes. That option wasn't explored for this patch because it was just disambiguating the compatibility flags and the full side-effects weren't realized. Would such a patch be accepted if we ended up pursuing this? Any suggestions on how to best go about this? It'd be ugly, but you'd have to set it up so that * new MDSes advertise the old set of required values * but can identify when all the MDSes are new * then mark somewhere that they can use the correct values * then switch to the proper requirements I don't remember the details of this CompatSet code any more, and it's definitely made trickier by the MDS having no permanent local state. Since we do luckily have both the IDs and the strings, you might be able to do something in the MDSMonitor to identify whether booting MDSes have "too-old", "old-featureset-but-support-new-feature", or "new, correct feature advertising" and then either massage that incoming message down to the "old-featureset-but-support-new-feature" (if not all the MDSes are new) or do an auto-upgrade of the required features in the map. And you might also need compatibility code in the MDS to make sure it sends out the appropriate bits on connection, but I *think* the CompatSet checks are only done on the monitor and when an MDS receives an MDSMap. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.co
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: > Hi, > > I just had the same issue with our 12.2.4 cluster but not during the > upgrade. > One of our 3 monitors restarted (the one with a standby MDS) and the 2 > others active MDS killed themselves : > > 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map > mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor > table,9=file layout v2} not writeable with daemon features > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=file layout v2}, killing myself > 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. wanted > state up:active > 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting down > rank 1 > > > 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map > mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor > table,9=file layout v2} not writeable with daemon features > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=file layout v2}, killing myself > 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. wanted > state up:active > 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting down > rank 0 > > I had to restart manually the MDS services to get it works. > > Adrien > > > Le 21/03/2018 à 11:37, Martin Palma a écrit : >> >> Just run into this problem on our production cluster >> >> It would have been nice if the release notes of 12.2.4 had been >> adapted to inform user about this. >> >> Best, >> Martin >> >> On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum >> wrote: >>> >>> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree >>> wrote: On 2018-03-14T06:57:08, Patrick Donnelly wrote: > Yes. But the real outcome is not "no MDS [is] active" but "some or all > metadata I/O will pause" -- and there is no avoiding that. During an > MDS upgrade, a standby must take over the MDS being shutdown (and > upgraded). During takeover, metadata I/O will briefly pause as the > rank is unavailable. (Specifically, no other rank can obtains locks or > communicate with the "failed" rank; so metadata I/O will necessarily > pause until a standby takes over.) Single active vs. multiple active > upgrade makes little difference in this outcome. Fair, except that there's no standby MDS at this time in case the update goes wrong. >> Is another approach theoretically feasible? Have the updated MDS only >> go >> into the incompatible mode once there's a quorum of new ones >> available, >> or something? > > I believe so, yes. That option wasn't explored for this patch because > it was just disambiguating the compatibility flags and the full > side-effects weren't realized. Would such a patch be accepted if we ended up pursuing this? Any suggestions on how to best go about this? >>> >>> It'd be ugly, but you'd have to set it up so that >>> * new MDSes advertise the old set of required values >>> * but can identify when all the MDSes are new >>> * then mark somewhere that they can use the correct values >>> * then switch to the proper requirements >>> >>> I don't remember the details of this CompatSet code any more, and it's >>> definitely made trickier by the MDS having no permanent local state. >>> Since we do luckily have both the IDs and the strings, you might be >>> able to do something in the MDSMonitor to identify whether booting >>> MDSes have "too-old", "old-featureset-but-support-new-feature", or >>> "new, correct feature advertising" and then either massage that >>> incoming message down to the "old-featureset-but-support-new-feature" >>> (if not all the MDSes are new) or do an auto-upgrade of the required >>> features in the map. And you might also need compatibility code in the >>> MDS to make sure it sends out the appropriate bits on connection, but >>> I *think* the CompatSet checks are only done on the monitor and when >>> an MDS receives an MDSMap. >>> -Greg >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> __
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Hi, I just had the same issue with our 12.2.4 cluster but not during the upgrade. One of our 3 monitors restarted (the one with a standby MDS) and the 2 others active MDS killed themselves : 2018-03-28 09:36:24.376888 7f910bc0f700 0 mds.cccephadm14 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.376903 7f910bc0f700 1 mds.cccephadm14 suicide. wanted state up:active 2018-03-28 09:36:25.379607 7f910bc0f700 1 mds.1.62 shutdown: shutting down rank 1 2018-03-28 09:36:24.375867 7fad455bf700 0 mds.cccephadm15 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-03-28 09:36:24.375883 7fad455bf700 1 mds.cccephadm15 suicide. wanted state up:active 2018-03-28 09:36:25.377633 7fad455bf700 1 mds.0.50 shutdown: shutting down rank 0 I had to restart manually the MDS services to get it works. Adrien Le 21/03/2018 à 11:37, Martin Palma a écrit : Just run into this problem on our production cluster It would have been nice if the release notes of 12.2.4 had been adapted to inform user about this. Best, Martin On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum wrote: On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree wrote: On 2018-03-14T06:57:08, Patrick Donnelly wrote: Yes. But the real outcome is not "no MDS [is] active" but "some or all metadata I/O will pause" -- and there is no avoiding that. During an MDS upgrade, a standby must take over the MDS being shutdown (and upgraded). During takeover, metadata I/O will briefly pause as the rank is unavailable. (Specifically, no other rank can obtains locks or communicate with the "failed" rank; so metadata I/O will necessarily pause until a standby takes over.) Single active vs. multiple active upgrade makes little difference in this outcome. Fair, except that there's no standby MDS at this time in case the update goes wrong. Is another approach theoretically feasible? Have the updated MDS only go into the incompatible mode once there's a quorum of new ones available, or something? I believe so, yes. That option wasn't explored for this patch because it was just disambiguating the compatibility flags and the full side-effects weren't realized. Would such a patch be accepted if we ended up pursuing this? Any suggestions on how to best go about this? It'd be ugly, but you'd have to set it up so that * new MDSes advertise the old set of required values * but can identify when all the MDSes are new * then mark somewhere that they can use the correct values * then switch to the proper requirements I don't remember the details of this CompatSet code any more, and it's definitely made trickier by the MDS having no permanent local state. Since we do luckily have both the IDs and the strings, you might be able to do something in the MDSMonitor to identify whether booting MDSes have "too-old", "old-featureset-but-support-new-feature", or "new, correct feature advertising" and then either massage that incoming message down to the "old-featureset-but-support-new-feature" (if not all the MDSes are new) or do an auto-upgrade of the required features in the map. And you might also need compatibility code in the MDS to make sure it sends out the appropriate bits on connection, but I *think* the CompatSet checks are only done on the monitor and when an MDS receives an MDSMap. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Just run into this problem on our production cluster It would have been nice if the release notes of 12.2.4 had been adapted to inform user about this. Best, Martin On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum wrote: > On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree wrote: >> On 2018-03-14T06:57:08, Patrick Donnelly wrote: >> >>> Yes. But the real outcome is not "no MDS [is] active" but "some or all >>> metadata I/O will pause" -- and there is no avoiding that. During an >>> MDS upgrade, a standby must take over the MDS being shutdown (and >>> upgraded). During takeover, metadata I/O will briefly pause as the >>> rank is unavailable. (Specifically, no other rank can obtains locks or >>> communicate with the "failed" rank; so metadata I/O will necessarily >>> pause until a standby takes over.) Single active vs. multiple active >>> upgrade makes little difference in this outcome. >> >> Fair, except that there's no standby MDS at this time in case the update >> goes wrong. >> >>> > Is another approach theoretically feasible? Have the updated MDS only go >>> > into the incompatible mode once there's a quorum of new ones available, >>> > or something? >>> I believe so, yes. That option wasn't explored for this patch because >>> it was just disambiguating the compatibility flags and the full >>> side-effects weren't realized. >> >> Would such a patch be accepted if we ended up pursuing this? Any >> suggestions on how to best go about this? > > It'd be ugly, but you'd have to set it up so that > * new MDSes advertise the old set of required values > * but can identify when all the MDSes are new > * then mark somewhere that they can use the correct values > * then switch to the proper requirements > > I don't remember the details of this CompatSet code any more, and it's > definitely made trickier by the MDS having no permanent local state. > Since we do luckily have both the IDs and the strings, you might be > able to do something in the MDSMonitor to identify whether booting > MDSes have "too-old", "old-featureset-but-support-new-feature", or > "new, correct feature advertising" and then either massage that > incoming message down to the "old-featureset-but-support-new-feature" > (if not all the MDSes are new) or do an auto-upgrade of the required > features in the map. And you might also need compatibility code in the > MDS to make sure it sends out the appropriate bits on connection, but > I *think* the CompatSet checks are only done on the monitor and when > an MDS receives an MDSMap. > -Greg > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree wrote: > On 2018-03-14T06:57:08, Patrick Donnelly wrote: > >> Yes. But the real outcome is not "no MDS [is] active" but "some or all >> metadata I/O will pause" -- and there is no avoiding that. During an >> MDS upgrade, a standby must take over the MDS being shutdown (and >> upgraded). During takeover, metadata I/O will briefly pause as the >> rank is unavailable. (Specifically, no other rank can obtains locks or >> communicate with the "failed" rank; so metadata I/O will necessarily >> pause until a standby takes over.) Single active vs. multiple active >> upgrade makes little difference in this outcome. > > Fair, except that there's no standby MDS at this time in case the update > goes wrong. > >> > Is another approach theoretically feasible? Have the updated MDS only go >> > into the incompatible mode once there's a quorum of new ones available, >> > or something? >> I believe so, yes. That option wasn't explored for this patch because >> it was just disambiguating the compatibility flags and the full >> side-effects weren't realized. > > Would such a patch be accepted if we ended up pursuing this? Any > suggestions on how to best go about this? It'd be ugly, but you'd have to set it up so that * new MDSes advertise the old set of required values * but can identify when all the MDSes are new * then mark somewhere that they can use the correct values * then switch to the proper requirements I don't remember the details of this CompatSet code any more, and it's definitely made trickier by the MDS having no permanent local state. Since we do luckily have both the IDs and the strings, you might be able to do something in the MDSMonitor to identify whether booting MDSes have "too-old", "old-featureset-but-support-new-feature", or "new, correct feature advertising" and then either massage that incoming message down to the "old-featureset-but-support-new-feature" (if not all the MDSes are new) or do an auto-upgrade of the required features in the map. And you might also need compatibility code in the MDS to make sure it sends out the appropriate bits on connection, but I *think* the CompatSet checks are only done on the monitor and when an MDS receives an MDSMap. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On 2018-03-14T06:57:08, Patrick Donnelly wrote: > Yes. But the real outcome is not "no MDS [is] active" but "some or all > metadata I/O will pause" -- and there is no avoiding that. During an > MDS upgrade, a standby must take over the MDS being shutdown (and > upgraded). During takeover, metadata I/O will briefly pause as the > rank is unavailable. (Specifically, no other rank can obtains locks or > communicate with the "failed" rank; so metadata I/O will necessarily > pause until a standby takes over.) Single active vs. multiple active > upgrade makes little difference in this outcome. Fair, except that there's no standby MDS at this time in case the update goes wrong. > > Is another approach theoretically feasible? Have the updated MDS only go > > into the incompatible mode once there's a quorum of new ones available, > > or something? > I believe so, yes. That option wasn't explored for this patch because > it was just disambiguating the compatibility flags and the full > side-effects weren't realized. Would such a patch be accepted if we ended up pursuing this? Any suggestions on how to best go about this? Anything that requires magic sauce on updates beyond the normal "MONs first, rolling through" makes me twitchy and tends to end with at least a few customers getting it not quite right ;-) Regards, Lars -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Mar 14, 2018 at 5:48 AM, Lars Marowsky-Bree wrote: > On 2018-02-28T02:38:34, Patrick Donnelly wrote: > >> I think it will be necessary to reduce the actives to 1 (max_mds -> 1; >> deactivate other ranks), shutdown standbys, upgrade the single active, >> then upgrade/start the standbys. >> >> Unfortunately this didn't get flagged in upgrade testing. Thanks for >> the report Dan. > > This means that - when the single active is being updated - there's a > time when there's no MDS active, right? Yes. But the real outcome is not "no MDS [is] active" but "some or all metadata I/O will pause" -- and there is no avoiding that. During an MDS upgrade, a standby must take over the MDS being shutdown (and upgraded). During takeover, metadata I/O will briefly pause as the rank is unavailable. (Specifically, no other rank can obtains locks or communicate with the "failed" rank; so metadata I/O will necessarily pause until a standby takes over.) Single active vs. multiple active upgrade makes little difference in this outcome. > Is another approach theoretically feasible? Have the updated MDS only go > into the incompatible mode once there's a quorum of new ones available, > or something? I believe so, yes. That option wasn't explored for this patch because it was just disambiguating the compatibility flags and the full side-effects weren't realized. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On 03/14/2018 01:48 PM, Lars Marowsky-Bree wrote: > On 2018-02-28T02:38:34, Patrick Donnelly wrote: > >> I think it will be necessary to reduce the actives to 1 (max_mds -> 1; >> deactivate other ranks), shutdown standbys, upgrade the single active, >> then upgrade/start the standbys. >> >> Unfortunately this didn't get flagged in upgrade testing. Thanks for >> the report Dan. > > This means that - when the single active is being updated - there's a > time when there's no MDS active, right? > > Is another approach theoretically feasible? Have the updated MDS only go > into the incompatible mode once there's a quorum of new ones available, > or something? > > (From the point of view of a distributed system, this is double plus > ungood.) here is what I did and what worked without any problem: we have mons and mds on the same hosts, 3 host in total 1. stop the 2 non active mds 2. update ceph on all 3 hosts 3. restart the active mds 4. start the mds on the others HTH Dietmar -- _ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Division for Bioinformatics Email: dietmar.rie...@i-med.ac.at Web: http://www.icbi.at signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On 2018-02-28T02:38:34, Patrick Donnelly wrote: > I think it will be necessary to reduce the actives to 1 (max_mds -> 1; > deactivate other ranks), shutdown standbys, upgrade the single active, > then upgrade/start the standbys. > > Unfortunately this didn't get flagged in upgrade testing. Thanks for > the report Dan. This means that - when the single active is being updated - there's a time when there's no MDS active, right? Is another approach theoretically feasible? Have the updated MDS only go into the incompatible mode once there's a quorum of new ones available, or something? (From the point of view of a distributed system, this is double plus ungood.) Regards, Lars -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Feb 28, 2018 at 11:05 AM, John Spray wrote: > On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster wrote: >> Hi all, >> >> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and >> OSD's updated fine. >> >> When updating the MDS's (we have 2 active and 1 standby), I started >> with the standby. >> >> At the moment the standby MDS restarted into 12.2.4 [1], both active >> MDSs (still running 12.2.2) suicided like this: >> >> 2018-02-28 10:25:22.761413 7f03da1b9700 0 mds.cephdwightmds0 >> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base >> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir >> inode in separate object,5=mds uses versioned encoding,6=dirfrag is >> stored in omap,8=no anchor table,9=file layout v2} not writeable with >> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client >> writeable ranges,3=default file layouts on dirs,4=dir inode in >> separate object,5=mds uses versioned encoding,6=dirfrag is stored in >> omap,7=mds uses inline data,8=file layout v2}, killing myself >> 2018-02-28 10:25:22.761429 7f03da1b9700 1 mds.cephdwightmds0 suicide. >> wanted state up:active >> 2018-02-28 10:25:23.763226 7f03da1b9700 1 mds.0.18147 shutdown: >> shutting down rank 0 >> >> >> 2018-02-28 10:25:22.761590 7f11df538700 0 mds.cephdwightmds1 >> handle_mds_map mdsmap compatset compat={},rocompat={} >> ,incompat={1=base v0.20,2=client writeable ranges,3=default file >> layouts on dirs,4=dir inode in separate object,5=m >> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor >> table,9=file layout v2} not writeable with daemo >> n features compat={},rocompat={},incompat={1=base v0.20,2=client >> writeable ranges,3=default file layouts on dirs,4= >> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag >> is stored in omap,7=mds uses inline data,8=fil >> e layout v2}, killing myself >> 2018-02-28 10:25:22.761613 7f11df538700 1 mds.cephdwightmds1 suicide. >> wanted state up:active >> 2018-02-28 10:25:23.765653 7f11df538700 1 mds.1.18366 shutdown: >> shutting down rank 1 > > That's not good! > > From looking at the commits between 12.2.2 and 12.2.4, this one looks > suspicious: > > commit ddba907279719631903e3a20543056d81d176a1b > Author: Yan, Zheng > Date: Tue Oct 31 16:56:51 2017 +0800 > > mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition > > Fixes: http://tracker.ceph.com/issues/21985 > Signed-off-by: "Yan, Zheng" > (cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638) Apologies for the noise, my mail client hadn't loaded the earlier responses in which this was already pointed out. John > John > > > >> >> >> The cephfs cluster was down until I updated all MDS's to 12.2.4 -- >> then they restarted cleanly. >> >> Looks like a pretty serious bug??!! >> >> Cheers, Dan >> >> >> [1] here is the standby restarting, 4 seconds before the active MDS's >> suicided: >> >> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0 0 set uid:gid to 167:167 (ceph:ceph) >> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0 0 ceph version 12.2.4 >> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process >> (unknown), pid 10648 >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster wrote: > Hi all, > > I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and > OSD's updated fine. > > When updating the MDS's (we have 2 active and 1 standby), I started > with the standby. > > At the moment the standby MDS restarted into 12.2.4 [1], both active > MDSs (still running 12.2.2) suicided like this: > > 2018-02-28 10:25:22.761413 7f03da1b9700 0 mds.cephdwightmds0 > handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base > v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir > inode in separate object,5=mds uses versioned encoding,6=dirfrag is > stored in omap,8=no anchor table,9=file layout v2} not writeable with > daemon features compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in > separate object,5=mds uses versioned encoding,6=dirfrag is stored in > omap,7=mds uses inline data,8=file layout v2}, killing myself > 2018-02-28 10:25:22.761429 7f03da1b9700 1 mds.cephdwightmds0 suicide. > wanted state up:active > 2018-02-28 10:25:23.763226 7f03da1b9700 1 mds.0.18147 shutdown: > shutting down rank 0 > > > 2018-02-28 10:25:22.761590 7f11df538700 0 mds.cephdwightmds1 > handle_mds_map mdsmap compatset compat={},rocompat={} > ,incompat={1=base v0.20,2=client writeable ranges,3=default file > layouts on dirs,4=dir inode in separate object,5=m > ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor > table,9=file layout v2} not writeable with daemo > n features compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4= > dir inode in separate object,5=mds uses versioned encoding,6=dirfrag > is stored in omap,7=mds uses inline data,8=fil > e layout v2}, killing myself > 2018-02-28 10:25:22.761613 7f11df538700 1 mds.cephdwightmds1 suicide. > wanted state up:active > 2018-02-28 10:25:23.765653 7f11df538700 1 mds.1.18366 shutdown: > shutting down rank 1 That's not good! >From looking at the commits between 12.2.2 and 12.2.4, this one looks suspicious: commit ddba907279719631903e3a20543056d81d176a1b Author: Yan, Zheng Date: Tue Oct 31 16:56:51 2017 +0800 mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition Fixes: http://tracker.ceph.com/issues/21985 Signed-off-by: "Yan, Zheng" (cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638) John > > > The cephfs cluster was down until I updated all MDS's to 12.2.4 -- > then they restarted cleanly. > > Looks like a pretty serious bug??!! > > Cheers, Dan > > > [1] here is the standby restarting, 4 seconds before the active MDS's > suicided: > > 2018-02-28 10:25:18.222865 7f9f1ea3b1c0 0 set uid:gid to 167:167 (ceph:ceph) > 2018-02-28 10:25:18.222892 7f9f1ea3b1c0 0 ceph version 12.2.4 > (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process > (unknown), pid 10648 > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Feb 28, 2018 at 11:38 AM, Patrick Donnelly wrote: > On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster wrote: >> (Sorry to spam) >> >> I guess it's related to this fix to the layout v2 feature id: >> https://github.com/ceph/ceph/pull/18782/files >> >> -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8, >> "file layout v2") >> +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9, >> "file layout v2") > > Yes, this looks to be the issue. > >> Is there a way to update from 12.2.2 without causing the other active >> MDS's to suicide? > > I think it will be necessary to reduce the actives to 1 (max_mds -> 1; > deactivate other ranks), shutdown standbys, upgrade the single active, > then upgrade/start the standbys. > > Unfortunately this didn't get flagged in upgrade testing. Thanks for > the report Dan. Thanks Patrick -- that's a good idea to reduce to 1 active. I've create http://tracker.ceph.com/issues/23172 in case any followup is needed. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster wrote: > (Sorry to spam) > > I guess it's related to this fix to the layout v2 feature id: > https://github.com/ceph/ceph/pull/18782/files > > -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8, > "file layout v2") > +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9, > "file layout v2") Yes, this looks to be the issue. > Is there a way to update from 12.2.2 without causing the other active > MDS's to suicide? I think it will be necessary to reduce the actives to 1 (max_mds -> 1; deactivate other ranks), shutdown standbys, upgrade the single active, then upgrade/start the standbys. Unfortunately this didn't get flagged in upgrade testing. Thanks for the report Dan. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
(Sorry to spam) I guess it's related to this fix to the layout v2 feature id: https://github.com/ceph/ceph/pull/18782/files -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8, "file layout v2") +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9, "file layout v2") Is there a way to update from 12.2.2 without causing the other active MDS's to suicide? Cheers, Dan On Wed, Feb 28, 2018 at 11:01 AM, Dan van der Ster wrote: > More: > > here is the MDS_FEATURES map for a running 12.2.2 cluster: > > compat: compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in > separate object,5=mds uses versioned encoding,6=dirfrag is stored in > omap,8=file layout v2} > > and here it is on this updated 12.2.4 cluster: > > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no > anchor table,9=file layout v2} > > > feature bit 8 is not the same for these two. Am I confused or did > these features get changed in 12.2.3/4. > > > Cheers, Dan > > p.s. yes 12.2.4 is tagged and out -- check your favourite repo. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
More: here is the MDS_FEATURES map for a running 12.2.2 cluster: compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} and here it is on this updated 12.2.4 cluster: compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} feature bit 8 is not the same for these two. Am I confused or did these features get changed in 12.2.3/4. Cheers, Dan p.s. yes 12.2.4 is tagged and out -- check your favourite repo. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Quoting Dan van der Ster (d...@vanderster.com): > Hi all, > > I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and > OSD's updated fine. 12.2.4? Did you mean 12.2.3? Or did I miss something? Gr. stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide
Hi all, I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and OSD's updated fine. When updating the MDS's (we have 2 active and 1 standby), I started with the standby. At the moment the standby MDS restarted into 12.2.4 [1], both active MDSs (still running 12.2.2) suicided like this: 2018-02-28 10:25:22.761413 7f03da1b9700 0 mds.cephdwightmds0 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 2018-02-28 10:25:22.761429 7f03da1b9700 1 mds.cephdwightmds0 suicide. wanted state up:active 2018-02-28 10:25:23.763226 7f03da1b9700 1 mds.0.18147 shutdown: shutting down rank 0 2018-02-28 10:25:22.761590 7f11df538700 0 mds.cephdwightmds1 handle_mds_map mdsmap compatset compat={},rocompat={} ,incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=m ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemo n features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4= dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=fil e layout v2}, killing myself 2018-02-28 10:25:22.761613 7f11df538700 1 mds.cephdwightmds1 suicide. wanted state up:active 2018-02-28 10:25:23.765653 7f11df538700 1 mds.1.18366 shutdown: shutting down rank 1 The cephfs cluster was down until I updated all MDS's to 12.2.4 -- then they restarted cleanly. Looks like a pretty serious bug??!! Cheers, Dan [1] here is the standby restarting, 4 seconds before the active MDS's suicided: 2018-02-28 10:25:18.222865 7f9f1ea3b1c0 0 set uid:gid to 167:167 (ceph:ceph) 2018-02-28 10:25:18.222892 7f9f1ea3b1c0 0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 10648 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com