Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr

Hmm looks like I restarted everything except MDS...
So it's the same issue! That's why the MDS kill themselves during the 
reboot of one of the monitors with MDS in 12.2.2.


Thanks Dan!

Adrien

Le 28/03/2018 à 16:43, Dan van der Ster a écrit :

Do you have the startup banners for mds.cccephadm14 and 15? It sure
looks like they were running 12.2.2 with the "not writeable with
daemon features" error.

-- dan

On Wed, Mar 28, 2018 at 3:12 PM, adrien.geor...@cc.in2p3.fr
 wrote:

Hi,

All Ceph services were in 12.2.4 version.

Adrien


Le 28/03/2018 à 14:47, Dan van der Ster a écrit :

Hi,

Which versions were those MDS's before and after the restarted standby
MDS?

Cheers, Dan



On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr
 wrote:

Hi,

I just had the same issue with our 12.2.4 cluster but not during the
upgrade.
One of our 3 monitors restarted (the one with a standby MDS) and the 2
others active MDS killed themselves :

2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide.
wanted
state up:active
2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting
down
rank 1


2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide.
wanted
state up:active
2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting
down
rank 0

I had to restart manually the MDS services to get it works.

Adrien


Le 21/03/2018 à 11:37, Martin Palma a écrit :

Just run into this problem on our production cluster

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum 
wrote:

On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree 
wrote:

On 2018-03-14T06:57:08, Patrick Donnelly  wrote:


Yes. But the real outcome is not "no MDS [is] active" but "some or
all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks
or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.

Fair, except that there's no standby MDS at this time in case the
update
goes wrong.


Is another approach theoretically feasible? Have the updated MDS
only
go
into the incompatible mode once there's a quorum of new ones
available,
or something?

I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.

Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?

It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done o

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
Do you have the startup banners for mds.cccephadm14 and 15? It sure
looks like they were running 12.2.2 with the "not writeable with
daemon features" error.

-- dan

On Wed, Mar 28, 2018 at 3:12 PM, adrien.geor...@cc.in2p3.fr
 wrote:
> Hi,
>
> All Ceph services were in 12.2.4 version.
>
> Adrien
>
>
> Le 28/03/2018 à 14:47, Dan van der Ster a écrit :
>>
>> Hi,
>>
>> Which versions were those MDS's before and after the restarted standby
>> MDS?
>>
>> Cheers, Dan
>>
>>
>>
>> On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr
>>  wrote:
>>>
>>> Hi,
>>>
>>> I just had the same issue with our 12.2.4 cluster but not during the
>>> upgrade.
>>> One of our 3 monitors restarted (the one with a standby MDS) and the 2
>>> others active MDS killed themselves :
>>>
>>> 2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
>>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
>>> anchor
>>> table,9=file layout v2} not writeable with daemon features
>>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds
>>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
>>> data,8=file layout v2}, killing myself
>>> 2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide.
>>> wanted
>>> state up:active
>>> 2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting
>>> down
>>> rank 1
>>>
>>>
>>> 2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
>>> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
>>> anchor
>>> table,9=file layout v2} not writeable with daemon features
>>> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
>>> ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds
>>> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
>>> data,8=file layout v2}, killing myself
>>> 2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide.
>>> wanted
>>> state up:active
>>> 2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting
>>> down
>>> rank 0
>>>
>>> I had to restart manually the MDS services to get it works.
>>>
>>> Adrien
>>>
>>>
>>> Le 21/03/2018 à 11:37, Martin Palma a écrit :

 Just run into this problem on our production cluster

 It would have been nice if the release notes of 12.2.4 had been
 adapted to inform user about this.

 Best,
 Martin

 On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum 
 wrote:
>
> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree 
> wrote:
>>
>> On 2018-03-14T06:57:08, Patrick Donnelly  wrote:
>>
>>> Yes. But the real outcome is not "no MDS [is] active" but "some or
>>> all
>>> metadata I/O will pause" -- and there is no avoiding that. During an
>>> MDS upgrade, a standby must take over the MDS being shutdown (and
>>> upgraded).  During takeover, metadata I/O will briefly pause as the
>>> rank is unavailable. (Specifically, no other rank can obtains locks
>>> or
>>> communicate with the "failed" rank; so metadata I/O will necessarily
>>> pause until a standby takes over.) Single active vs. multiple active
>>> upgrade makes little difference in this outcome.
>>
>> Fair, except that there's no standby MDS at this time in case the
>> update
>> goes wrong.
>>
 Is another approach theoretically feasible? Have the updated MDS
 only
 go
 into the incompatible mode once there's a quorum of new ones
 available,
 or something?
>>>
>>> I believe so, yes. That option wasn't explored for this patch because
>>> it was just disambiguating the compatibility flags and the full
>>> side-effects weren't realized.
>>
>> Would such a patch be accepted if we ended up pursuing this? Any
>> suggestions on how to best go about this?
>
> It'd be ugly, but you'd have to set it up so that
> * new MDSes advertise the old set of required values
> * but can identify when all the MDSes are new
> * then mark somewhere that they can use the correct values
> * then switch to the proper requirements
>
> I don't remember the details of this CompatSet code any more, and it's
> definitely made trickier by the MDS having no permanent local state.
> Since we do luckily have both the IDs and the strings, you might be
> able to do something in the MDSMonitor to identify whether booting
> MDSes have "too-old", "old-featureset-but-support-new-feature", or
> "new, correct feature a

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr

Hi,

All Ceph services were in 12.2.4 version.

Adrien

Le 28/03/2018 à 14:47, Dan van der Ster a écrit :

Hi,

Which versions were those MDS's before and after the restarted standby MDS?

Cheers, Dan



On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr
 wrote:

Hi,

I just had the same issue with our 12.2.4 cluster but not during the
upgrade.
One of our 3 monitors restarted (the one with a standby MDS) and the 2
others active MDS killed themselves :

2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide. wanted
state up:active
2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting down
rank 1


2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2} not writeable with daemon features
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
data,8=file layout v2}, killing myself
2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide. wanted
state up:active
2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting down
rank 0

I had to restart manually the MDS services to get it works.

Adrien


Le 21/03/2018 à 11:37, Martin Palma a écrit :

Just run into this problem on our production cluster

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum 
wrote:

On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree 
wrote:

On 2018-03-14T06:57:08, Patrick Donnelly  wrote:


Yes. But the real outcome is not "no MDS [is] active" but "some or all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.

Fair, except that there's no standby MDS at this time in case the update
goes wrong.


Is another approach theoretically feasible? Have the updated MDS only
go
into the incompatible mode once there's a quorum of new ones
available,
or something?

I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.

Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?

It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.co

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
Hi,

Which versions were those MDS's before and after the restarted standby MDS?

Cheers, Dan



On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr
 wrote:
> Hi,
>
> I just had the same issue with our 12.2.4 cluster but not during the
> upgrade.
> One of our 3 monitors restarted (the one with a standby MDS) and the 2
> others active MDS killed themselves :
>
> 2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map
> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemon features
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=file layout v2}, killing myself
> 2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide. wanted
> state up:active
> 2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting down
> rank 1
>
>
> 2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 handle_mds_map
> mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemon features
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data,8=file layout v2}, killing myself
> 2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide. wanted
> state up:active
> 2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting down
> rank 0
>
> I had to restart manually the MDS services to get it works.
>
> Adrien
>
>
> Le 21/03/2018 à 11:37, Martin Palma a écrit :
>>
>> Just run into this problem on our production cluster
>>
>> It would have been nice if the release notes of 12.2.4 had been
>> adapted to inform user about this.
>>
>> Best,
>> Martin
>>
>> On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum 
>> wrote:
>>>
>>> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree 
>>> wrote:

 On 2018-03-14T06:57:08, Patrick Donnelly  wrote:

> Yes. But the real outcome is not "no MDS [is] active" but "some or all
> metadata I/O will pause" -- and there is no avoiding that. During an
> MDS upgrade, a standby must take over the MDS being shutdown (and
> upgraded).  During takeover, metadata I/O will briefly pause as the
> rank is unavailable. (Specifically, no other rank can obtains locks or
> communicate with the "failed" rank; so metadata I/O will necessarily
> pause until a standby takes over.) Single active vs. multiple active
> upgrade makes little difference in this outcome.

 Fair, except that there's no standby MDS at this time in case the update
 goes wrong.

>> Is another approach theoretically feasible? Have the updated MDS only
>> go
>> into the incompatible mode once there's a quorum of new ones
>> available,
>> or something?
>
> I believe so, yes. That option wasn't explored for this patch because
> it was just disambiguating the compatibility flags and the full
> side-effects weren't realized.

 Would such a patch be accepted if we ended up pursuing this? Any
 suggestions on how to best go about this?
>>>
>>> It'd be ugly, but you'd have to set it up so that
>>> * new MDSes advertise the old set of required values
>>> * but can identify when all the MDSes are new
>>> * then mark somewhere that they can use the correct values
>>> * then switch to the proper requirements
>>>
>>> I don't remember the details of this CompatSet code any more, and it's
>>> definitely made trickier by the MDS having no permanent local state.
>>> Since we do luckily have both the IDs and the strings, you might be
>>> able to do something in the MDSMonitor to identify whether booting
>>> MDSes have "too-old", "old-featureset-but-support-new-feature", or
>>> "new, correct feature advertising" and then either massage that
>>> incoming message down to the "old-featureset-but-support-new-feature"
>>> (if not all the MDSes are new) or do an auto-upgrade of the required
>>> features in the map. And you might also need compatibility code in the
>>> MDS to make sure it sends out the appropriate bits on connection, but
>>> I *think* the CompatSet checks are only done on the monitor and when
>>> an MDS receives an MDSMap.
>>> -Greg
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> __

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr

Hi,

I just had the same issue with our 12.2.4 cluster but not during the 
upgrade.
One of our 3 monitors restarted (the one with a standby MDS) and the 2 
others active MDS killed themselves :


2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base 
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir 
inode in separate object,5=mds uses versioned encoding,6=dirfrag is 
stored in omap,8=no anchor table,9=file layout v2} not writeable with 
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds 
uses inline data,8=file layout v2}, killing myself
2018-03-28 09:36:24.376903 7f910bc0f700  1 mds.cccephadm14 suicide. 
wanted state up:active
2018-03-28 09:36:25.379607 7f910bc0f700  1 mds.1.62 shutdown: shutting 
down rank 1



2018-03-28 09:36:24.375867 7fad455bf700  0 mds.cccephadm15 
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base 
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir 
inode in separate object,5=mds uses versioned encoding,6=dirfrag is 
stored in omap,8=no anchor table,9=file layout v2} not writeable with 
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds 
uses inline data,8=file layout v2}, killing myself
2018-03-28 09:36:24.375883 7fad455bf700  1 mds.cccephadm15 suicide. 
wanted state up:active
2018-03-28 09:36:25.377633 7fad455bf700  1 mds.0.50 shutdown: shutting 
down rank 0


I had to restart manually the MDS services to get it works.

Adrien

Le 21/03/2018 à 11:37, Martin Palma a écrit :

Just run into this problem on our production cluster

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum  wrote:

On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree  wrote:

On 2018-03-14T06:57:08, Patrick Donnelly  wrote:


Yes. But the real outcome is not "no MDS [is] active" but "some or all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.

Fair, except that there's no standby MDS at this time in case the update
goes wrong.


Is another approach theoretically feasible? Have the updated MDS only go
into the incompatible mode once there's a quorum of new ones available,
or something?

I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.

Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?

It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-21 Thread Martin Palma
Just run into this problem on our production cluster

It would have been nice if the release notes of 12.2.4 had been
adapted to inform user about this.

Best,
Martin

On Wed, Mar 14, 2018 at 9:53 PM, Gregory Farnum  wrote:
> On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree  wrote:
>> On 2018-03-14T06:57:08, Patrick Donnelly  wrote:
>>
>>> Yes. But the real outcome is not "no MDS [is] active" but "some or all
>>> metadata I/O will pause" -- and there is no avoiding that. During an
>>> MDS upgrade, a standby must take over the MDS being shutdown (and
>>> upgraded).  During takeover, metadata I/O will briefly pause as the
>>> rank is unavailable. (Specifically, no other rank can obtains locks or
>>> communicate with the "failed" rank; so metadata I/O will necessarily
>>> pause until a standby takes over.) Single active vs. multiple active
>>> upgrade makes little difference in this outcome.
>>
>> Fair, except that there's no standby MDS at this time in case the update
>> goes wrong.
>>
>>> > Is another approach theoretically feasible? Have the updated MDS only go
>>> > into the incompatible mode once there's a quorum of new ones available,
>>> > or something?
>>> I believe so, yes. That option wasn't explored for this patch because
>>> it was just disambiguating the compatibility flags and the full
>>> side-effects weren't realized.
>>
>> Would such a patch be accepted if we ended up pursuing this? Any
>> suggestions on how to best go about this?
>
> It'd be ugly, but you'd have to set it up so that
> * new MDSes advertise the old set of required values
> * but can identify when all the MDSes are new
> * then mark somewhere that they can use the correct values
> * then switch to the proper requirements
>
> I don't remember the details of this CompatSet code any more, and it's
> definitely made trickier by the MDS having no permanent local state.
> Since we do luckily have both the IDs and the strings, you might be
> able to do something in the MDSMonitor to identify whether booting
> MDSes have "too-old", "old-featureset-but-support-new-feature", or
> "new, correct feature advertising" and then either massage that
> incoming message down to the "old-featureset-but-support-new-feature"
> (if not all the MDSes are new) or do an auto-upgrade of the required
> features in the map. And you might also need compatibility code in the
> MDS to make sure it sends out the appropriate bits on connection, but
> I *think* the CompatSet checks are only done on the monitor and when
> an MDS receives an MDSMap.
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Gregory Farnum
On Wed, Mar 14, 2018 at 12:41 PM, Lars Marowsky-Bree  wrote:
> On 2018-03-14T06:57:08, Patrick Donnelly  wrote:
>
>> Yes. But the real outcome is not "no MDS [is] active" but "some or all
>> metadata I/O will pause" -- and there is no avoiding that. During an
>> MDS upgrade, a standby must take over the MDS being shutdown (and
>> upgraded).  During takeover, metadata I/O will briefly pause as the
>> rank is unavailable. (Specifically, no other rank can obtains locks or
>> communicate with the "failed" rank; so metadata I/O will necessarily
>> pause until a standby takes over.) Single active vs. multiple active
>> upgrade makes little difference in this outcome.
>
> Fair, except that there's no standby MDS at this time in case the update
> goes wrong.
>
>> > Is another approach theoretically feasible? Have the updated MDS only go
>> > into the incompatible mode once there's a quorum of new ones available,
>> > or something?
>> I believe so, yes. That option wasn't explored for this patch because
>> it was just disambiguating the compatibility flags and the full
>> side-effects weren't realized.
>
> Would such a patch be accepted if we ended up pursuing this? Any
> suggestions on how to best go about this?

It'd be ugly, but you'd have to set it up so that
* new MDSes advertise the old set of required values
* but can identify when all the MDSes are new
* then mark somewhere that they can use the correct values
* then switch to the proper requirements

I don't remember the details of this CompatSet code any more, and it's
definitely made trickier by the MDS having no permanent local state.
Since we do luckily have both the IDs and the strings, you might be
able to do something in the MDSMonitor to identify whether booting
MDSes have "too-old", "old-featureset-but-support-new-feature", or
"new, correct feature advertising" and then either massage that
incoming message down to the "old-featureset-but-support-new-feature"
(if not all the MDSes are new) or do an auto-upgrade of the required
features in the map. And you might also need compatibility code in the
MDS to make sure it sends out the appropriate bits on connection, but
I *think* the CompatSet checks are only done on the monitor and when
an MDS receives an MDSMap.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Lars Marowsky-Bree
On 2018-03-14T06:57:08, Patrick Donnelly  wrote:

> Yes. But the real outcome is not "no MDS [is] active" but "some or all
> metadata I/O will pause" -- and there is no avoiding that. During an
> MDS upgrade, a standby must take over the MDS being shutdown (and
> upgraded).  During takeover, metadata I/O will briefly pause as the
> rank is unavailable. (Specifically, no other rank can obtains locks or
> communicate with the "failed" rank; so metadata I/O will necessarily
> pause until a standby takes over.) Single active vs. multiple active
> upgrade makes little difference in this outcome.

Fair, except that there's no standby MDS at this time in case the update
goes wrong.

> > Is another approach theoretically feasible? Have the updated MDS only go
> > into the incompatible mode once there's a quorum of new ones available,
> > or something?
> I believe so, yes. That option wasn't explored for this patch because
> it was just disambiguating the compatibility flags and the full
> side-effects weren't realized.

Would such a patch be accepted if we ended up pursuing this? Any
suggestions on how to best go about this?

Anything that requires magic sauce on updates beyond the normal "MONs
first, rolling through" makes me twitchy and tends to end with at least
a few customers getting it not quite right ;-)


Regards,
Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Patrick Donnelly
On Wed, Mar 14, 2018 at 5:48 AM, Lars Marowsky-Bree  wrote:
> On 2018-02-28T02:38:34, Patrick Donnelly  wrote:
>
>> I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
>> deactivate other ranks), shutdown standbys, upgrade the single active,
>> then upgrade/start the standbys.
>>
>> Unfortunately this didn't get flagged in upgrade testing. Thanks for
>> the report Dan.
>
> This means that - when the single active is being updated - there's a
> time when there's no MDS active, right?

Yes. But the real outcome is not "no MDS [is] active" but "some or all
metadata I/O will pause" -- and there is no avoiding that. During an
MDS upgrade, a standby must take over the MDS being shutdown (and
upgraded).  During takeover, metadata I/O will briefly pause as the
rank is unavailable. (Specifically, no other rank can obtains locks or
communicate with the "failed" rank; so metadata I/O will necessarily
pause until a standby takes over.) Single active vs. multiple active
upgrade makes little difference in this outcome.

> Is another approach theoretically feasible? Have the updated MDS only go
> into the incompatible mode once there's a quorum of new ones available,
> or something?

I believe so, yes. That option wasn't explored for this patch because
it was just disambiguating the compatibility flags and the full
side-effects weren't realized.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Dietmar Rieder
On 03/14/2018 01:48 PM, Lars Marowsky-Bree wrote:
> On 2018-02-28T02:38:34, Patrick Donnelly  wrote:
> 
>> I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
>> deactivate other ranks), shutdown standbys, upgrade the single active,
>> then upgrade/start the standbys.
>>
>> Unfortunately this didn't get flagged in upgrade testing. Thanks for
>> the report Dan.
> 
> This means that - when the single active is being updated - there's a
> time when there's no MDS active, right?
> 
> Is another approach theoretically feasible? Have the updated MDS only go
> into the incompatible mode once there's a quorum of new ones available,
> or something?
> 
> (From the point of view of a distributed system, this is double plus
> ungood.)

here is what I did and what worked without any problem:

we have mons and mds on the same hosts, 3 host in total

1. stop the 2 non active mds
2. update ceph on all 3 hosts
3. restart the active mds
4. start the mds on the others

HTH
  Dietmar

-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Lars Marowsky-Bree
On 2018-02-28T02:38:34, Patrick Donnelly  wrote:

> I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
> deactivate other ranks), shutdown standbys, upgrade the single active,
> then upgrade/start the standbys.
> 
> Unfortunately this didn't get flagged in upgrade testing. Thanks for
> the report Dan.

This means that - when the single active is being updated - there's a
time when there's no MDS active, right?

Is another approach theoretically feasible? Have the updated MDS only go
into the incompatible mode once there's a quorum of new ones available,
or something?

(From the point of view of a distributed system, this is double plus
ungood.)



Regards,
Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread John Spray
On Wed, Feb 28, 2018 at 11:05 AM, John Spray  wrote:
> On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster  wrote:
>> Hi all,
>>
>> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
>> OSD's updated fine.
>>
>> When updating the MDS's (we have 2 active and 1 standby), I started
>> with the standby.
>>
>> At the moment the standby MDS restarted into 12.2.4 [1], both active
>> MDSs (still running 12.2.2) suicided like this:
>>
>> 2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
>> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
>> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
>> inode in separate object,5=mds uses versioned encoding,6=dirfrag is
>> stored in omap,8=no anchor table,9=file layout v2} not writeable with
>> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
>> writeable ranges,3=default file layouts on dirs,4=dir inode in
>> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
>> omap,7=mds uses inline data,8=file layout v2}, killing myself
>> 2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
>> wanted state up:active
>> 2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
>> shutting down rank 0
>>
>>
>> 2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
>> handle_mds_map mdsmap compatset compat={},rocompat={}
>> ,incompat={1=base v0.20,2=client writeable ranges,3=default file
>> layouts on dirs,4=dir inode in separate object,5=m
>> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
>> table,9=file layout v2} not writeable with daemo
>> n features compat={},rocompat={},incompat={1=base v0.20,2=client
>> writeable ranges,3=default file layouts on dirs,4=
>> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
>> is stored in omap,7=mds uses inline data,8=fil
>> e layout v2}, killing myself
>> 2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
>> wanted state up:active
>> 2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
>> shutting down rank 1
>
> That's not good!
>
> From looking at the commits between 12.2.2 and 12.2.4, this one looks
> suspicious:
>
> commit ddba907279719631903e3a20543056d81d176a1b
> Author: Yan, Zheng 
> Date:   Tue Oct 31 16:56:51 2017 +0800
>
> mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition
>
> Fixes: http://tracker.ceph.com/issues/21985
> Signed-off-by: "Yan, Zheng" 
> (cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638)

Apologies for the noise, my mail client hadn't loaded the earlier
responses in which this was already pointed out.

John

> John
>
>
>
>>
>>
>> The cephfs cluster was down until I updated all MDS's to 12.2.4 --
>> then they restarted cleanly.
>>
>> Looks like a pretty serious bug??!!
>>
>> Cheers, Dan
>>
>>
>> [1] here is the standby restarting, 4 seconds before the active MDS's 
>> suicided:
>>
>> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
>> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
>> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
>> (unknown), pid 10648
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread John Spray
On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster  wrote:
> Hi all,
>
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.
>
> When updating the MDS's (we have 2 active and 1 standby), I started
> with the standby.
>
> At the moment the standby MDS restarted into 12.2.4 [1], both active
> MDSs (still running 12.2.2) suicided like this:
>
> 2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
> inode in separate object,5=mds uses versioned encoding,6=dirfrag is
> stored in omap,8=no anchor table,9=file layout v2} not writeable with
> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,7=mds uses inline data,8=file layout v2}, killing myself
> 2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
> shutting down rank 0
>
>
> 2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
> handle_mds_map mdsmap compatset compat={},rocompat={}
> ,incompat={1=base v0.20,2=client writeable ranges,3=default file
> layouts on dirs,4=dir inode in separate object,5=m
> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemo
> n features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=
> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
> is stored in omap,7=mds uses inline data,8=fil
> e layout v2}, killing myself
> 2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
> shutting down rank 1

That's not good!

>From looking at the commits between 12.2.2 and 12.2.4, this one looks
suspicious:

commit ddba907279719631903e3a20543056d81d176a1b
Author: Yan, Zheng 
Date:   Tue Oct 31 16:56:51 2017 +0800

mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition

Fixes: http://tracker.ceph.com/issues/21985
Signed-off-by: "Yan, Zheng" 
(cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638)

John



>
>
> The cephfs cluster was down until I updated all MDS's to 12.2.4 --
> then they restarted cleanly.
>
> Looks like a pretty serious bug??!!
>
> Cheers, Dan
>
>
> [1] here is the standby restarting, 4 seconds before the active MDS's 
> suicided:
>
> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 10648
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
On Wed, Feb 28, 2018 at 11:38 AM, Patrick Donnelly  wrote:
> On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster  wrote:
>> (Sorry to spam)
>>
>> I guess it's related to this fix to the layout v2 feature id:
>> https://github.com/ceph/ceph/pull/18782/files
>>
>> -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
>> "file layout v2")
>> +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
>> "file layout v2")
>
> Yes, this looks to be the issue.
>
>> Is there a way to update from 12.2.2 without causing the other active
>> MDS's to suicide?
>
> I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
> deactivate other ranks), shutdown standbys, upgrade the single active,
> then upgrade/start the standbys.
>
> Unfortunately this didn't get flagged in upgrade testing. Thanks for
> the report Dan.

Thanks Patrick -- that's a good idea to reduce to 1 active.
I've create http://tracker.ceph.com/issues/23172 in case any followup is needed.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Patrick Donnelly
On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster  wrote:
> (Sorry to spam)
>
> I guess it's related to this fix to the layout v2 feature id:
> https://github.com/ceph/ceph/pull/18782/files
>
> -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
> "file layout v2")
> +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
> "file layout v2")

Yes, this looks to be the issue.

> Is there a way to update from 12.2.2 without causing the other active
> MDS's to suicide?

I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
deactivate other ranks), shutdown standbys, upgrade the single active,
then upgrade/start the standbys.

Unfortunately this didn't get flagged in upgrade testing. Thanks for
the report Dan.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
(Sorry to spam)

I guess it's related to this fix to the layout v2 feature id:
https://github.com/ceph/ceph/pull/18782/files

-#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
"file layout v2")
+#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
"file layout v2")

Is there a way to update from 12.2.2 without causing the other active
MDS's to suicide?

Cheers, Dan



On Wed, Feb 28, 2018 at 11:01 AM, Dan van der Ster  wrote:
> More:
>
> here is the MDS_FEATURES map for a running 12.2.2 cluster:
>
> compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,8=file layout v2}
>
> and here it is on this updated 12.2.4 cluster:
>
> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2}
>
>
> feature bit 8 is not the same for these two. Am I confused or did
> these features get changed in 12.2.3/4.
>
>
> Cheers, Dan
>
> p.s. yes 12.2.4 is tagged and out -- check your favourite repo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
More:

here is the MDS_FEATURES map for a running 12.2.2 cluster:

compat: compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,8=file layout v2}

and here it is on this updated 12.2.4 cluster:

compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2}


feature bit 8 is not the same for these two. Am I confused or did
these features get changed in 12.2.3/4.


Cheers, Dan

p.s. yes 12.2.4 is tagged and out -- check your favourite repo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Stefan Kooman
Quoting Dan van der Ster (d...@vanderster.com):
> Hi all,
> 
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.

12.2.4? Did you mean 12.2.3? Or did I miss something?

Gr. stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster
Hi all,

I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
OSD's updated fine.

When updating the MDS's (we have 2 active and 1 standby), I started
with the standby.

At the moment the standby MDS restarted into 12.2.4 [1], both active
MDSs (still running 12.2.2) suicided like this:

2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
inode in separate object,5=mds uses versioned encoding,6=dirfrag is
stored in omap,8=no anchor table,9=file layout v2} not writeable with
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,7=mds uses inline data,8=file layout v2}, killing myself
2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
wanted state up:active
2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
shutting down rank 0


2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
handle_mds_map mdsmap compatset compat={},rocompat={}
,incompat={1=base v0.20,2=client writeable ranges,3=default file
layouts on dirs,4=dir inode in separate object,5=m
ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2} not writeable with daemo
n features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=
dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
is stored in omap,7=mds uses inline data,8=fil
e layout v2}, killing myself
2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
wanted state up:active
2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
shutting down rank 1



The cephfs cluster was down until I updated all MDS's to 12.2.4 --
then they restarted cleanly.

Looks like a pretty serious bug??!!

Cheers, Dan


[1] here is the standby restarting, 4 seconds before the active MDS's suicided:

2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
(unknown), pid 10648
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com