Re: [ceph-users] cephfs snapshot format upgrade
On Tue, Apr 10, 2018 at 8:50 PM, Yan, Zhengwrote: > On Wed, Apr 11, 2018 at 3:34 AM, Gregory Farnum wrote: >> On Tue, Apr 10, 2018 at 5:54 AM, John Spray wrote: >>> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: Hello To simplify snapshot handling in multiple active mds setup, we changed format of snaprealm in mimic dev. https://github.com/ceph/ceph/pull/16779. The new version mds can handle old format snaprealm in single active setup. It also can convert old format snaprealm to the new format when snaprealm is modified. The problem is that new version mds can not properly handle old format snaprealm in multiple active setup. It may crash when it encounter old format snaprealm. For existing filesystem with snapshots, upgrading mds to mimic seems to be no problem at first glance. But if user later enables multiple active mds, mds may crashes continuously. No easy way to switch back to single acitve mds. I don't have clear idea how to handle this situation. I can think of a few options. 1. Forbid multiple active before all old snapshots are deleted or before all snaprealms are converted to new format. Format conversion requires traversing while whole filesystem tree. Not easy to implement. >>> >>> This has been a general problem with metadata format changes: we can >>> never know if all the metadata in a filesystem has been brought up to >>> a particular version. Scrubbing (where scrub does the updates) should >>> be the answer, but we don't have the mechanism for recording/ensuring >>> the scrub has really happened. >>> >>> Maybe we need the MDS to be able to report a complete whole-filesystem >>> scrub to the monitor, and record a field like >>> "latest_scrubbed_version" in FSMap, so that we can be sure that all >>> the filesystem metadata has been brought up to a certain version >>> before enabling certain features? So we'd then have a >>> "latest_scrubbed_version >= mimic" test before enabling multiple >>> active daemons. >> >> Don't we have a (recursive!) last_scrub_[stamp|version] on all >> directories? There's not (yet) a mechanism for associating that with >> specific data versions like you describe here, but for a one-time >> upgrade with unsupported features I don't think we need anything too >> sophisticated. >> -Greg >> > No, we don't. Besides, normal recursive stats (record last update) does not > work for this case. We need a recursive stat that tracks the oldest > update on all > directories.. Well, inode_t has a last_scrub_version and last_scrub_stamp member. They're part of encoding version 13. My recollection is that a scrub on a directory is only considered complete when all of its descendants have scrubbed, but maybe I'm misremembering and we'll happily do one at a time. I think I saw elsewhere that dealing with the upgrades is now in-progress though so I presume some other solution came to hand. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Wed, Apr 11, 2018 at 3:34 AM, Gregory Farnumwrote: > On Tue, Apr 10, 2018 at 5:54 AM, John Spray wrote: >> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: >>> Hello >>> >>> To simplify snapshot handling in multiple active mds setup, we changed >>> format of snaprealm in mimic dev. >>> https://github.com/ceph/ceph/pull/16779. >>> >>> The new version mds can handle old format snaprealm in single active >>> setup. It also can convert old format snaprealm to the new format when >>> snaprealm is modified. The problem is that new version mds can not >>> properly handle old format snaprealm in multiple active setup. It may >>> crash when it encounter old format snaprealm. For existing filesystem >>> with snapshots, upgrading mds to mimic seems to be no problem at first >>> glance. But if user later enables multiple active mds, mds may >>> crashes continuously. No easy way to switch back to single acitve mds. >>> >>> I don't have clear idea how to handle this situation. I can think of a >>> few options. >>> >>> 1. Forbid multiple active before all old snapshots are deleted or >>> before all snaprealms are converted to new format. Format conversion >>> requires traversing while whole filesystem tree. Not easy to >>> implement. >> >> This has been a general problem with metadata format changes: we can >> never know if all the metadata in a filesystem has been brought up to >> a particular version. Scrubbing (where scrub does the updates) should >> be the answer, but we don't have the mechanism for recording/ensuring >> the scrub has really happened. >> >> Maybe we need the MDS to be able to report a complete whole-filesystem >> scrub to the monitor, and record a field like >> "latest_scrubbed_version" in FSMap, so that we can be sure that all >> the filesystem metadata has been brought up to a certain version >> before enabling certain features? So we'd then have a >> "latest_scrubbed_version >= mimic" test before enabling multiple >> active daemons. > > Don't we have a (recursive!) last_scrub_[stamp|version] on all > directories? There's not (yet) a mechanism for associating that with > specific data versions like you describe here, but for a one-time > upgrade with unsupported features I don't think we need anything too > sophisticated. > -Greg > No, we don't. Besides, normal recursive stats (record last update) does not work for this case. We need a recursive stat that tracks the oldest update on all directories.. Regards Yan, Zheng >> >> For this particular situation, we'd also need to protect against >> people who had enabled multimds and snapshots experimentally, with an >> MDS startup check like: >> ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && >> (allows_multimds() || in.size() >1)) && latest_scrubbed_version < >> mimic >> >> John >> >>> 2. Ask user to delete all old snapshots before upgrading to mimic, >>> make mds just ignore old format snaprealms. >>> >>> >>> Regards >>> Yan, Zheng >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Wed, Apr 11, 2018 at 10:10 AM, Sage Weilwrote: > On Tue, 10 Apr 2018, Patrick Donnelly wrote: >> On Tue, Apr 10, 2018 at 5:54 AM, John Spray wrote: >> > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: >> >> Hello >> >> >> >> To simplify snapshot handling in multiple active mds setup, we changed >> >> format of snaprealm in mimic dev. >> >> https://github.com/ceph/ceph/pull/16779. >> >> >> >> The new version mds can handle old format snaprealm in single active >> >> setup. It also can convert old format snaprealm to the new format when >> >> snaprealm is modified. The problem is that new version mds can not >> >> properly handle old format snaprealm in multiple active setup. It may >> >> crash when it encounter old format snaprealm. For existing filesystem >> >> with snapshots, upgrading mds to mimic seems to be no problem at first >> >> glance. But if user later enables multiple active mds, mds may >> >> crashes continuously. No easy way to switch back to single acitve mds. >> >> >> >> I don't have clear idea how to handle this situation. I can think of a >> >> few options. >> >> >> >> 1. Forbid multiple active before all old snapshots are deleted or >> >> before all snaprealms are converted to new format. Format conversion >> >> requires traversing while whole filesystem tree. Not easy to >> >> implement. >> > >> > This has been a general problem with metadata format changes: we can >> > never know if all the metadata in a filesystem has been brought up to >> > a particular version. Scrubbing (where scrub does the updates) should >> > be the answer, but we don't have the mechanism for recording/ensuring >> > the scrub has really happened. >> > >> > Maybe we need the MDS to be able to report a complete whole-filesystem >> > scrub to the monitor, and record a field like >> > "latest_scrubbed_version" in FSMap, so that we can be sure that all >> > the filesystem metadata has been brought up to a certain version >> > before enabling certain features? So we'd then have a >> > "latest_scrubbed_version >= mimic" test before enabling multiple >> > active daemons. >> > >> > For this particular situation, we'd also need to protect against >> > people who had enabled multimds and snapshots experimentally, with an >> > MDS startup check like: >> > ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && >> > (allows_multimds() || in.size() >1)) && latest_scrubbed_version < >> > mimic >> >> This sounds like the right approach to me. The mons should also be >> capable of performing the same test and raising a health error that >> pre-Mimic MDSs must be started and the number of actives be reduced to >> 1. > > Does scrub actually do the conversion already, though, or does that need > to be implemented? > need to be implemented Regards Yan, Zheng > sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Tue, 10 Apr 2018, Patrick Donnelly wrote: > On Tue, Apr 10, 2018 at 5:54 AM, John Spraywrote: > > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: > >> Hello > >> > >> To simplify snapshot handling in multiple active mds setup, we changed > >> format of snaprealm in mimic dev. > >> https://github.com/ceph/ceph/pull/16779. > >> > >> The new version mds can handle old format snaprealm in single active > >> setup. It also can convert old format snaprealm to the new format when > >> snaprealm is modified. The problem is that new version mds can not > >> properly handle old format snaprealm in multiple active setup. It may > >> crash when it encounter old format snaprealm. For existing filesystem > >> with snapshots, upgrading mds to mimic seems to be no problem at first > >> glance. But if user later enables multiple active mds, mds may > >> crashes continuously. No easy way to switch back to single acitve mds. > >> > >> I don't have clear idea how to handle this situation. I can think of a > >> few options. > >> > >> 1. Forbid multiple active before all old snapshots are deleted or > >> before all snaprealms are converted to new format. Format conversion > >> requires traversing while whole filesystem tree. Not easy to > >> implement. > > > > This has been a general problem with metadata format changes: we can > > never know if all the metadata in a filesystem has been brought up to > > a particular version. Scrubbing (where scrub does the updates) should > > be the answer, but we don't have the mechanism for recording/ensuring > > the scrub has really happened. > > > > Maybe we need the MDS to be able to report a complete whole-filesystem > > scrub to the monitor, and record a field like > > "latest_scrubbed_version" in FSMap, so that we can be sure that all > > the filesystem metadata has been brought up to a certain version > > before enabling certain features? So we'd then have a > > "latest_scrubbed_version >= mimic" test before enabling multiple > > active daemons. > > > > For this particular situation, we'd also need to protect against > > people who had enabled multimds and snapshots experimentally, with an > > MDS startup check like: > > ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && > > (allows_multimds() || in.size() >1)) && latest_scrubbed_version < > > mimic > > This sounds like the right approach to me. The mons should also be > capable of performing the same test and raising a health error that > pre-Mimic MDSs must be started and the number of actives be reduced to > 1. Does scrub actually do the conversion already, though, or does that need to be implemented? sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Tue, Apr 10, 2018 at 5:54 AM, John Spraywrote: > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: >> Hello >> >> To simplify snapshot handling in multiple active mds setup, we changed >> format of snaprealm in mimic dev. >> https://github.com/ceph/ceph/pull/16779. >> >> The new version mds can handle old format snaprealm in single active >> setup. It also can convert old format snaprealm to the new format when >> snaprealm is modified. The problem is that new version mds can not >> properly handle old format snaprealm in multiple active setup. It may >> crash when it encounter old format snaprealm. For existing filesystem >> with snapshots, upgrading mds to mimic seems to be no problem at first >> glance. But if user later enables multiple active mds, mds may >> crashes continuously. No easy way to switch back to single acitve mds. >> >> I don't have clear idea how to handle this situation. I can think of a >> few options. >> >> 1. Forbid multiple active before all old snapshots are deleted or >> before all snaprealms are converted to new format. Format conversion >> requires traversing while whole filesystem tree. Not easy to >> implement. > > This has been a general problem with metadata format changes: we can > never know if all the metadata in a filesystem has been brought up to > a particular version. Scrubbing (where scrub does the updates) should > be the answer, but we don't have the mechanism for recording/ensuring > the scrub has really happened. > > Maybe we need the MDS to be able to report a complete whole-filesystem > scrub to the monitor, and record a field like > "latest_scrubbed_version" in FSMap, so that we can be sure that all > the filesystem metadata has been brought up to a certain version > before enabling certain features? So we'd then have a > "latest_scrubbed_version >= mimic" test before enabling multiple > active daemons. > > For this particular situation, we'd also need to protect against > people who had enabled multimds and snapshots experimentally, with an > MDS startup check like: > ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && > (allows_multimds() || in.size() >1)) && latest_scrubbed_version < > mimic This sounds like the right approach to me. The mons should also be capable of performing the same test and raising a health error that pre-Mimic MDSs must be started and the number of actives be reduced to 1. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Tue, Apr 10, 2018 at 5:54 AM, John Spraywrote: > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng wrote: >> Hello >> >> To simplify snapshot handling in multiple active mds setup, we changed >> format of snaprealm in mimic dev. >> https://github.com/ceph/ceph/pull/16779. >> >> The new version mds can handle old format snaprealm in single active >> setup. It also can convert old format snaprealm to the new format when >> snaprealm is modified. The problem is that new version mds can not >> properly handle old format snaprealm in multiple active setup. It may >> crash when it encounter old format snaprealm. For existing filesystem >> with snapshots, upgrading mds to mimic seems to be no problem at first >> glance. But if user later enables multiple active mds, mds may >> crashes continuously. No easy way to switch back to single acitve mds. >> >> I don't have clear idea how to handle this situation. I can think of a >> few options. >> >> 1. Forbid multiple active before all old snapshots are deleted or >> before all snaprealms are converted to new format. Format conversion >> requires traversing while whole filesystem tree. Not easy to >> implement. > > This has been a general problem with metadata format changes: we can > never know if all the metadata in a filesystem has been brought up to > a particular version. Scrubbing (where scrub does the updates) should > be the answer, but we don't have the mechanism for recording/ensuring > the scrub has really happened. > > Maybe we need the MDS to be able to report a complete whole-filesystem > scrub to the monitor, and record a field like > "latest_scrubbed_version" in FSMap, so that we can be sure that all > the filesystem metadata has been brought up to a certain version > before enabling certain features? So we'd then have a > "latest_scrubbed_version >= mimic" test before enabling multiple > active daemons. Don't we have a (recursive!) last_scrub_[stamp|version] on all directories? There's not (yet) a mechanism for associating that with specific data versions like you describe here, but for a one-time upgrade with unsupported features I don't think we need anything too sophisticated. -Greg > > For this particular situation, we'd also need to protect against > people who had enabled multimds and snapshots experimentally, with an > MDS startup check like: > ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && > (allows_multimds() || in.size() >1)) && latest_scrubbed_version < > mimic > > John > >> 2. Ask user to delete all old snapshots before upgrading to mimic, >> make mds just ignore old format snaprealms. >> >> >> Regards >> Yan, Zheng >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs snapshot format upgrade
On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zhengwrote: > Hello > > To simplify snapshot handling in multiple active mds setup, we changed > format of snaprealm in mimic dev. > https://github.com/ceph/ceph/pull/16779. > > The new version mds can handle old format snaprealm in single active > setup. It also can convert old format snaprealm to the new format when > snaprealm is modified. The problem is that new version mds can not > properly handle old format snaprealm in multiple active setup. It may > crash when it encounter old format snaprealm. For existing filesystem > with snapshots, upgrading mds to mimic seems to be no problem at first > glance. But if user later enables multiple active mds, mds may > crashes continuously. No easy way to switch back to single acitve mds. > > I don't have clear idea how to handle this situation. I can think of a > few options. > > 1. Forbid multiple active before all old snapshots are deleted or > before all snaprealms are converted to new format. Format conversion > requires traversing while whole filesystem tree. Not easy to > implement. This has been a general problem with metadata format changes: we can never know if all the metadata in a filesystem has been brought up to a particular version. Scrubbing (where scrub does the updates) should be the answer, but we don't have the mechanism for recording/ensuring the scrub has really happened. Maybe we need the MDS to be able to report a complete whole-filesystem scrub to the monitor, and record a field like "latest_scrubbed_version" in FSMap, so that we can be sure that all the filesystem metadata has been brought up to a certain version before enabling certain features? So we'd then have a "latest_scrubbed_version >= mimic" test before enabling multiple active daemons. For this particular situation, we'd also need to protect against people who had enabled multimds and snapshots experimentally, with an MDS startup check like: ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) && (allows_multimds() || in.size() >1)) && latest_scrubbed_version < mimic John > 2. Ask user to delete all old snapshots before upgrading to mimic, > make mds just ignore old format snaprealms. > > > Regards > Yan, Zheng > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com