Re: [ceph-users] cephfs snapshot format upgrade

2018-04-11 Thread Gregory Farnum
On Tue, Apr 10, 2018 at 8:50 PM, Yan, Zheng  wrote:
> On Wed, Apr 11, 2018 at 3:34 AM, Gregory Farnum  wrote:
>> On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
>>> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
 Hello

 To simplify snapshot handling in multiple active mds setup, we changed
 format of snaprealm in mimic dev.
 https://github.com/ceph/ceph/pull/16779.

 The new version mds can handle old format snaprealm in single active
 setup. It also can convert old format snaprealm to the new format when
 snaprealm is modified. The problem is that new version mds can not
 properly handle old format snaprealm in multiple active setup. It may
 crash when it encounter old format snaprealm. For existing filesystem
 with snapshots, upgrading mds to mimic seems to be no problem at first
 glance. But if user later enables multiple active mds,  mds may
 crashes continuously. No easy way to switch back to single acitve mds.

 I don't have clear idea how to handle this situation. I can think of a
 few options.

 1. Forbid multiple active before all old snapshots are deleted or
 before all snaprealms are converted to new format. Format conversion
 requires traversing while whole filesystem tree.  Not easy to
 implement.
>>>
>>> This has been a general problem with metadata format changes: we can
>>> never know if all the metadata in a filesystem has been brought up to
>>> a particular version.  Scrubbing (where scrub does the updates) should
>>> be the answer, but we don't have the mechanism for recording/ensuring
>>> the scrub has really happened.
>>>
>>> Maybe we need the MDS to be able to report a complete whole-filesystem
>>> scrub to the monitor, and record a field like
>>> "latest_scrubbed_version" in FSMap, so that we can be sure that all
>>> the filesystem metadata has been brought up to a certain version
>>> before enabling certain features?  So we'd then have a
>>> "latest_scrubbed_version >= mimic" test before enabling multiple
>>> active daemons.
>>
>> Don't we have a (recursive!) last_scrub_[stamp|version] on all
>> directories? There's not (yet) a mechanism for associating that with
>> specific data versions like you describe here, but for a one-time
>> upgrade with unsupported features I don't think we need anything too
>> sophisticated.
>> -Greg
>>
> No, we don't.  Besides, normal recursive stats (record last update) does not
> work for this case. We need a recursive stat that tracks the oldest
> update on all
> directories..

Well, inode_t has a last_scrub_version and last_scrub_stamp member.
They're part of encoding version 13. My recollection is that a scrub
on a directory is only considered complete when all of its descendants
have scrubbed, but maybe I'm misremembering and we'll happily do one
at a time.

I think I saw elsewhere that dealing with the upgrades is now
in-progress though so I presume some other solution came to hand.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Yan, Zheng
On Wed, Apr 11, 2018 at 3:34 AM, Gregory Farnum  wrote:
> On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
>> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
>>> Hello
>>>
>>> To simplify snapshot handling in multiple active mds setup, we changed
>>> format of snaprealm in mimic dev.
>>> https://github.com/ceph/ceph/pull/16779.
>>>
>>> The new version mds can handle old format snaprealm in single active
>>> setup. It also can convert old format snaprealm to the new format when
>>> snaprealm is modified. The problem is that new version mds can not
>>> properly handle old format snaprealm in multiple active setup. It may
>>> crash when it encounter old format snaprealm. For existing filesystem
>>> with snapshots, upgrading mds to mimic seems to be no problem at first
>>> glance. But if user later enables multiple active mds,  mds may
>>> crashes continuously. No easy way to switch back to single acitve mds.
>>>
>>> I don't have clear idea how to handle this situation. I can think of a
>>> few options.
>>>
>>> 1. Forbid multiple active before all old snapshots are deleted or
>>> before all snaprealms are converted to new format. Format conversion
>>> requires traversing while whole filesystem tree.  Not easy to
>>> implement.
>>
>> This has been a general problem with metadata format changes: we can
>> never know if all the metadata in a filesystem has been brought up to
>> a particular version.  Scrubbing (where scrub does the updates) should
>> be the answer, but we don't have the mechanism for recording/ensuring
>> the scrub has really happened.
>>
>> Maybe we need the MDS to be able to report a complete whole-filesystem
>> scrub to the monitor, and record a field like
>> "latest_scrubbed_version" in FSMap, so that we can be sure that all
>> the filesystem metadata has been brought up to a certain version
>> before enabling certain features?  So we'd then have a
>> "latest_scrubbed_version >= mimic" test before enabling multiple
>> active daemons.
>
> Don't we have a (recursive!) last_scrub_[stamp|version] on all
> directories? There's not (yet) a mechanism for associating that with
> specific data versions like you describe here, but for a one-time
> upgrade with unsupported features I don't think we need anything too
> sophisticated.
> -Greg
>
No, we don't.  Besides, normal recursive stats (record last update) does not
work for this case. We need a recursive stat that tracks the oldest
update on all
directories..

Regards
Yan, Zheng

>>
>> For this particular situation, we'd also need to protect against
>> people who had enabled multimds and snapshots experimentally, with an
>> MDS startup check like:
>>  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
>> (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
>> mimic
>>
>> John
>>
>>> 2. Ask user to delete all old snapshots before upgrading to mimic,
>>> make mds just ignore old format snaprealms.
>>>
>>>
>>> Regards
>>> Yan, Zheng
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Yan, Zheng
On Wed, Apr 11, 2018 at 10:10 AM, Sage Weil  wrote:
> On Tue, 10 Apr 2018, Patrick Donnelly wrote:
>> On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
>> > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
>> >> Hello
>> >>
>> >> To simplify snapshot handling in multiple active mds setup, we changed
>> >> format of snaprealm in mimic dev.
>> >> https://github.com/ceph/ceph/pull/16779.
>> >>
>> >> The new version mds can handle old format snaprealm in single active
>> >> setup. It also can convert old format snaprealm to the new format when
>> >> snaprealm is modified. The problem is that new version mds can not
>> >> properly handle old format snaprealm in multiple active setup. It may
>> >> crash when it encounter old format snaprealm. For existing filesystem
>> >> with snapshots, upgrading mds to mimic seems to be no problem at first
>> >> glance. But if user later enables multiple active mds,  mds may
>> >> crashes continuously. No easy way to switch back to single acitve mds.
>> >>
>> >> I don't have clear idea how to handle this situation. I can think of a
>> >> few options.
>> >>
>> >> 1. Forbid multiple active before all old snapshots are deleted or
>> >> before all snaprealms are converted to new format. Format conversion
>> >> requires traversing while whole filesystem tree.  Not easy to
>> >> implement.
>> >
>> > This has been a general problem with metadata format changes: we can
>> > never know if all the metadata in a filesystem has been brought up to
>> > a particular version.  Scrubbing (where scrub does the updates) should
>> > be the answer, but we don't have the mechanism for recording/ensuring
>> > the scrub has really happened.
>> >
>> > Maybe we need the MDS to be able to report a complete whole-filesystem
>> > scrub to the monitor, and record a field like
>> > "latest_scrubbed_version" in FSMap, so that we can be sure that all
>> > the filesystem metadata has been brought up to a certain version
>> > before enabling certain features?  So we'd then have a
>> > "latest_scrubbed_version >= mimic" test before enabling multiple
>> > active daemons.
>> >
>> > For this particular situation, we'd also need to protect against
>> > people who had enabled multimds and snapshots experimentally, with an
>> > MDS startup check like:
>> >  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
>> > (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
>> > mimic
>>
>> This sounds like the right approach to me. The mons should also be
>> capable of performing the same test and raising a health error that
>> pre-Mimic MDSs must be started and the number of actives be reduced to
>> 1.
>
> Does scrub actually do the conversion already, though, or does that need
> to be implemented?
>

need to be implemented

Regards
Yan, Zheng

> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Sage Weil
On Tue, 10 Apr 2018, Patrick Donnelly wrote:
> On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
> > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
> >> Hello
> >>
> >> To simplify snapshot handling in multiple active mds setup, we changed
> >> format of snaprealm in mimic dev.
> >> https://github.com/ceph/ceph/pull/16779.
> >>
> >> The new version mds can handle old format snaprealm in single active
> >> setup. It also can convert old format snaprealm to the new format when
> >> snaprealm is modified. The problem is that new version mds can not
> >> properly handle old format snaprealm in multiple active setup. It may
> >> crash when it encounter old format snaprealm. For existing filesystem
> >> with snapshots, upgrading mds to mimic seems to be no problem at first
> >> glance. But if user later enables multiple active mds,  mds may
> >> crashes continuously. No easy way to switch back to single acitve mds.
> >>
> >> I don't have clear idea how to handle this situation. I can think of a
> >> few options.
> >>
> >> 1. Forbid multiple active before all old snapshots are deleted or
> >> before all snaprealms are converted to new format. Format conversion
> >> requires traversing while whole filesystem tree.  Not easy to
> >> implement.
> >
> > This has been a general problem with metadata format changes: we can
> > never know if all the metadata in a filesystem has been brought up to
> > a particular version.  Scrubbing (where scrub does the updates) should
> > be the answer, but we don't have the mechanism for recording/ensuring
> > the scrub has really happened.
> >
> > Maybe we need the MDS to be able to report a complete whole-filesystem
> > scrub to the monitor, and record a field like
> > "latest_scrubbed_version" in FSMap, so that we can be sure that all
> > the filesystem metadata has been brought up to a certain version
> > before enabling certain features?  So we'd then have a
> > "latest_scrubbed_version >= mimic" test before enabling multiple
> > active daemons.
> >
> > For this particular situation, we'd also need to protect against
> > people who had enabled multimds and snapshots experimentally, with an
> > MDS startup check like:
> >  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
> > (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
> > mimic
> 
> This sounds like the right approach to me. The mons should also be
> capable of performing the same test and raising a health error that
> pre-Mimic MDSs must be started and the number of actives be reduced to
> 1.

Does scrub actually do the conversion already, though, or does that need 
to be implemented?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Patrick Donnelly
On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
>> Hello
>>
>> To simplify snapshot handling in multiple active mds setup, we changed
>> format of snaprealm in mimic dev.
>> https://github.com/ceph/ceph/pull/16779.
>>
>> The new version mds can handle old format snaprealm in single active
>> setup. It also can convert old format snaprealm to the new format when
>> snaprealm is modified. The problem is that new version mds can not
>> properly handle old format snaprealm in multiple active setup. It may
>> crash when it encounter old format snaprealm. For existing filesystem
>> with snapshots, upgrading mds to mimic seems to be no problem at first
>> glance. But if user later enables multiple active mds,  mds may
>> crashes continuously. No easy way to switch back to single acitve mds.
>>
>> I don't have clear idea how to handle this situation. I can think of a
>> few options.
>>
>> 1. Forbid multiple active before all old snapshots are deleted or
>> before all snaprealms are converted to new format. Format conversion
>> requires traversing while whole filesystem tree.  Not easy to
>> implement.
>
> This has been a general problem with metadata format changes: we can
> never know if all the metadata in a filesystem has been brought up to
> a particular version.  Scrubbing (where scrub does the updates) should
> be the answer, but we don't have the mechanism for recording/ensuring
> the scrub has really happened.
>
> Maybe we need the MDS to be able to report a complete whole-filesystem
> scrub to the monitor, and record a field like
> "latest_scrubbed_version" in FSMap, so that we can be sure that all
> the filesystem metadata has been brought up to a certain version
> before enabling certain features?  So we'd then have a
> "latest_scrubbed_version >= mimic" test before enabling multiple
> active daemons.
>
> For this particular situation, we'd also need to protect against
> people who had enabled multimds and snapshots experimentally, with an
> MDS startup check like:
>  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
> (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
> mimic

This sounds like the right approach to me. The mons should also be
capable of performing the same test and raising a health error that
pre-Mimic MDSs must be started and the number of actives be reduced to
1.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Gregory Farnum
On Tue, Apr 10, 2018 at 5:54 AM, John Spray  wrote:
> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
>> Hello
>>
>> To simplify snapshot handling in multiple active mds setup, we changed
>> format of snaprealm in mimic dev.
>> https://github.com/ceph/ceph/pull/16779.
>>
>> The new version mds can handle old format snaprealm in single active
>> setup. It also can convert old format snaprealm to the new format when
>> snaprealm is modified. The problem is that new version mds can not
>> properly handle old format snaprealm in multiple active setup. It may
>> crash when it encounter old format snaprealm. For existing filesystem
>> with snapshots, upgrading mds to mimic seems to be no problem at first
>> glance. But if user later enables multiple active mds,  mds may
>> crashes continuously. No easy way to switch back to single acitve mds.
>>
>> I don't have clear idea how to handle this situation. I can think of a
>> few options.
>>
>> 1. Forbid multiple active before all old snapshots are deleted or
>> before all snaprealms are converted to new format. Format conversion
>> requires traversing while whole filesystem tree.  Not easy to
>> implement.
>
> This has been a general problem with metadata format changes: we can
> never know if all the metadata in a filesystem has been brought up to
> a particular version.  Scrubbing (where scrub does the updates) should
> be the answer, but we don't have the mechanism for recording/ensuring
> the scrub has really happened.
>
> Maybe we need the MDS to be able to report a complete whole-filesystem
> scrub to the monitor, and record a field like
> "latest_scrubbed_version" in FSMap, so that we can be sure that all
> the filesystem metadata has been brought up to a certain version
> before enabling certain features?  So we'd then have a
> "latest_scrubbed_version >= mimic" test before enabling multiple
> active daemons.

Don't we have a (recursive!) last_scrub_[stamp|version] on all
directories? There's not (yet) a mechanism for associating that with
specific data versions like you describe here, but for a one-time
upgrade with unsupported features I don't think we need anything too
sophisticated.
-Greg

>
> For this particular situation, we'd also need to protect against
> people who had enabled multimds and snapshots experimentally, with an
> MDS startup check like:
>  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
> (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
> mimic
>
> John
>
>> 2. Ask user to delete all old snapshots before upgrading to mimic,
>> make mds just ignore old format snaprealms.
>>
>>
>> Regards
>> Yan, Zheng
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread John Spray
On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng  wrote:
> Hello
>
> To simplify snapshot handling in multiple active mds setup, we changed
> format of snaprealm in mimic dev.
> https://github.com/ceph/ceph/pull/16779.
>
> The new version mds can handle old format snaprealm in single active
> setup. It also can convert old format snaprealm to the new format when
> snaprealm is modified. The problem is that new version mds can not
> properly handle old format snaprealm in multiple active setup. It may
> crash when it encounter old format snaprealm. For existing filesystem
> with snapshots, upgrading mds to mimic seems to be no problem at first
> glance. But if user later enables multiple active mds,  mds may
> crashes continuously. No easy way to switch back to single acitve mds.
>
> I don't have clear idea how to handle this situation. I can think of a
> few options.
>
> 1. Forbid multiple active before all old snapshots are deleted or
> before all snaprealms are converted to new format. Format conversion
> requires traversing while whole filesystem tree.  Not easy to
> implement.

This has been a general problem with metadata format changes: we can
never know if all the metadata in a filesystem has been brought up to
a particular version.  Scrubbing (where scrub does the updates) should
be the answer, but we don't have the mechanism for recording/ensuring
the scrub has really happened.

Maybe we need the MDS to be able to report a complete whole-filesystem
scrub to the monitor, and record a field like
"latest_scrubbed_version" in FSMap, so that we can be sure that all
the filesystem metadata has been brought up to a certain version
before enabling certain features?  So we'd then have a
"latest_scrubbed_version >= mimic" test before enabling multiple
active daemons.

For this particular situation, we'd also need to protect against
people who had enabled multimds and snapshots experimentally, with an
MDS startup check like:
 ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
(allows_multimds() || in.size() >1)) && latest_scrubbed_version <
mimic

John

> 2. Ask user to delete all old snapshots before upgrading to mimic,
> make mds just ignore old format snaprealms.
>
>
> Regards
> Yan, Zheng
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com