Re: [ceph-users] MGR Logs after Failure Testing

2019-06-28 Thread Eugen Block
You may want to configure your standby-mds's to be "standby-replay" so  
the mds that's taking over from the failed one takes less time to take  
over. To manage this you add to your ceph.conf something like this:


---snip---
[mds.server1]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.server2]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.server3]
mds_standby_replay = true
mds_standby_for_rank = 0
---snip---

For your setup this would mean you have one active mds, one as  
standby-replay (that takes over immediately, depending on the load a  
very short interruption could happen) and one as standby ("cold  
standby" if you will). Currently both your standby mds servers are  
"cold".



Zitat von dhils...@performair.com:


Eugen;

All services are running, yes, though they didn't all start when I  
brought the host up (configured not to start because the last thing  
I had done is physically relocate the entire cluster).


All services are running, and happy.

# ceph status
  cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

  services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 20h)
mgr: S700028(active, since 17h), standbys: S700029, S700030
mds: cifs:1 {0=S700029=up:active} 2 up:standby
osd: 6 osds: 6 up (since 21h), 6 in (since 21h)

  data:
pools:   16 pools, 192 pgs
objects: 449 objects, 761 MiB
usage:   724 GiB used, 65 TiB / 66 TiB avail
pgs: 192 active+clean

# ceph osd tree
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
 2   hdd 11.02950 osd.2up  1.0 1.0
 3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
 4   hdd 11.02950 osd.4up  1.0 1.0
 5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
 0   hdd 11.02950 osd.0up  1.0 1.0
 1   hdd 11.02950 osd.1up  1.0 1.0

The question about configuring the MDS as failover struck me as a  
potential, since I don't remember doing that, however it look like  
S700029 (10.0.200.111) took over from S700028 (10.0.200.110) as the  
active MDS.


Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On  
Behalf Of Eugen Block

Sent: Thursday, June 27, 2019 8:23 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MGR Logs after Failure Testing

Hi,

some more information about the cluster status would be helpful, such as

ceph -s
ceph osd tree

service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for
rank 0 so that a failover can happen?

Regards,
Eugen


Zitat von dhils...@performair.com:


All;

I built a demonstration and testing cluster, just 3 hosts
(10.0.200.110, 111, 112).  Each host runs mon, mgr, osd, mds.

During the demonstration yesterday, I pulled the power on one of the hosts.

After bringing the host back up, I'm getting several error messages
every second or so:
2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch:
unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed
6) v7 from mds.? v2:10.0.200.112:6808/980053124
2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to
return metadata for mds.S700030: (2) No such file or directory
2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch:
unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed
1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to
return metadata for mds.S700029: (2) No such file or directory

Thoughts?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
Eugen;

All services are running, yes, though they didn't all start when I brought the 
host up (configured not to start because the last thing I had done is 
physically relocate the entire cluster).

All services are running, and happy.

# ceph status
  cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

  services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 20h)
mgr: S700028(active, since 17h), standbys: S700029, S700030
mds: cifs:1 {0=S700029=up:active} 2 up:standby
osd: 6 osds: 6 up (since 21h), 6 in (since 21h)

  data:
pools:   16 pools, 192 pgs
objects: 449 objects, 761 MiB
usage:   724 GiB used, 65 TiB / 66 TiB avail
pgs: 192 active+clean

# ceph osd tree
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
 2   hdd 11.02950 osd.2up  1.0 1.0
 3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
 4   hdd 11.02950 osd.4up  1.0 1.0
 5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
 0   hdd 11.02950 osd.0up  1.0 1.0
 1   hdd 11.02950 osd.1up  1.0 1.0

The question about configuring the MDS as failover struck me as a potential, 
since I don't remember doing that, however it look like S700029 (10.0.200.111) 
took over from S700028 (10.0.200.110) as the active MDS.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eugen 
Block
Sent: Thursday, June 27, 2019 8:23 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MGR Logs after Failure Testing

Hi,

some more information about the cluster status would be helpful, such as

ceph -s
ceph osd tree

service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for  
rank 0 so that a failover can happen?

Regards,
Eugen


Zitat von dhils...@performair.com:

> All;
>
> I built a demonstration and testing cluster, just 3 hosts  
> (10.0.200.110, 111, 112).  Each host runs mon, mgr, osd, mds.
>
> During the demonstration yesterday, I pulled the power on one of the hosts.
>
> After bringing the host back up, I'm getting several error messages  
> every second or so:
> 2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch:  
> unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed  
> 6) v7 from mds.? v2:10.0.200.112:6808/980053124
> 2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to  
> return metadata for mds.S700030: (2) No such file or directory
> 2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch:  
> unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed  
> 1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
> 2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to  
> return metadata for mds.S700029: (2) No such file or directory
>
> Thoughts?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread Eugen Block

Hi,

some more information about the cluster status would be helpful, such as

ceph -s
ceph osd tree

service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for  
rank 0 so that a failover can happen?


Regards,
Eugen


Zitat von dhils...@performair.com:


All;

I built a demonstration and testing cluster, just 3 hosts  
(10.0.200.110, 111, 112).  Each host runs mon, mgr, osd, mds.


During the demonstration yesterday, I pulled the power on one of the hosts.

After bringing the host back up, I'm getting several error messages  
every second or so:
2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch:  
unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed  
6) v7 from mds.? v2:10.0.200.112:6808/980053124
2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to  
return metadata for mds.S700030: (2) No such file or directory
2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch:  
unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed  
1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to  
return metadata for mds.S700029: (2) No such file or directory


Thoughts?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
All;

I built a demonstration and testing cluster, just 3 hosts (10.0.200.110, 111, 
112).  Each host runs mon, mgr, osd, mds.

During the demonstration yesterday, I pulled the power on one of the hosts.

After bringing the host back up, I'm getting several error messages every 
second or so:
2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch: unhandled message 
0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed 6) v7 from mds.? 
v2:10.0.200.112:6808/980053124
2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to return 
metadata for mds.S700030: (2) No such file or directory
2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch: unhandled message 
0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed 1366) v7 from mds.0 
v2:10.0.200.111:6808/2726495738
2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to return 
metadata for mds.S700029: (2) No such file or directory

Thoughts?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com