Re: [ceph-users] Mon Create currently at the state of probing

2017-06-22 Thread Jim Forde
David,

SUCCESS!! Thank you so much!

I rebuilt the node because I could not install Jewel over the remnants of 
Kraken.
So, while I did install Jewel I am not convinced that was the solution. I did 
something that I had not tried under the Kraken attempts that solved the 
problem.

For future_me here was the solution.

Removed all references to r710e from the ceph.conf on ceph-deploy node in the 
original deployment folder home/cephadminaccount/ceph-cluster/ceph.conf
“Ceph-deploy –overwrite-conf config push r710a r710b r710c” etc to all nodes 
including the ceph-deploy node so it is now in the /etc/ceph/ceph.conf
“Ceph-deploy install --release jewel r710e”
“Ceph-deploy admin r710e”
“sudo chmod +r /etc/ceph/ceph.client.admin.keyring” Run on node r710e
“ceph-deploy mon create r710e”

Node was created but still had the very same probing errors. Ugh.

Then I went to home/cephadminaccount/ceph-cluster/ceph.conf and added r710e 
back in just the way it was before and pushed it to all nodes.
“Ceph-deploy –overwrite-conf config push r710a r710b r710c” etc
“Sudo reboot” on r710g don’t know if this was necessary. When it came up ceph 
-s was good. Rebooted r710e for good measure. Did not reboot r710f.

I am wondering if I had just pushed the ceph.conf back out in the first place, 
would it have solved the problem.
That is for another day.

-Jim


From: David Turner [mailto:drakonst...@gmail.com]
Sent: Wednesday, June 21, 2017 4:19 PM
To: Jim Forde <j...@mninc.net>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mon Create currently at the state of probing

You can specify an option in ceph-deploy to tell it which release of ceph to 
install, jewel, kraken, hammer, etc.  `ceph-deploy --release jewel` would pin 
the command to using jewel instead of kraken.

While running a mixed environment is supported, it should always be tested 
before assuming it will work for you in production.  The Mons are quick enough 
to upgrade, I always do them together.  Following I upgrade half of my OSDs in 
a test environment and leave it there for a couple weeks (or until adequate 
testing is done) before upgrading the remaining OSDs and again waiting until 
the testing is done, I would probably do the MDS before the OSDs, but don't 
usually think about that since I don't have them in a production environment.  
Lastly I would test upgrading the clients (vm hosts, RGW, kernel clients, etc) 
and test this state the most thoroughly.  In production I haven't had to worry 
about an upgrade taking longer than a few hours with over 60 OSD nodes, 5 mons, 
and a dozen clients.  I just don't see a need to run in a mixed environment in 
production, even if it is supported.

Back to your problem with adding in the mon.  Do your existing mons know about 
the third mon, or have you removed it from their running config?  It might be 
worth double checking their config file and restarting the daemons after you 
know they will pick up the correct settings.  It's hard for me to help with 
this part as I've been lucky enough not to have any problems with the docs 
online for this when it's come up.  I've replaced 5 mons without any issues.  I 
didn't use ceph-deploy, except to install the packages, though and did the 
manual steps for it.

Hopefully adding the mon back on Jewel fixes the issue.  That would be the 
easiest outcome.  I don't know that the Ceph team has tested adding upgraded 
mons to an old quorum.

On Wed, Jun 21, 2017 at 4:52 PM Jim Forde 
<j...@mninc.net<mailto:j...@mninc.net>> wrote:
David,
Thanks for the reply.

The scenario:
Monitor node fails for whatever reason, Bad blocks in HD, or Motherboard fail, 
whatever.

Procedure:
Remove the monitor from the cluster, replace hardware, reinstall OS and add 
monitor to cluster.

That is exactly what I did. However, my ceph-deploy node had already been 
upgraded to Kraken.
The goal is to not use this as an upgrade path per se, but to recover from a 
failed monitor node in a cluster where there is an upgrade in progress.

The upgrade notes for Jewel to Kraken say you may upgrade OSDs Monitors and 
MSDs in any order. Perhaps I am reading too much into this, but I took it as I 
could proceed with the upgrade at my leisure. Making sure each node is 
successfully upgraded before proceeding to the next node. The implication is 
that I can run the cluster with different version daemons (at least during the 
upgrade process).

So that brings me to the problem at hand.
What is the correct procedure for replacing a failed Monitor Node, especially 
if the failed Monitor is a mon_initial_member?
Does it have to be the same version as the other Monitors in the cluster?
I do have a public network statement in the ceph.conf file.
The monitor r710e is listed as one of the mon_initial_members in ceph.conf with 
the correct IP address, but the error message is:
“[r710e][WARNIN] r710e is not defined in `mon initial members`”
Also “[r710e][WARNIN] monitor r710e does not exist i

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-21 Thread David Turner
 GB used, 8188 GB / 8849 GB avail
>  256 active+clean
>   client io 8135 B/s rd, 44745 B/s wr, 0 op/s rd, 5 op/s wr
>
>
>
>
>
> PS.
>
>
>
> Tried this too
>
> ceph mon remove r710e
> mon.r710e does not exist or has already been removed
>
>
>
>
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Monday, June 19, 2017 12:58 PM
> *To:* Jim Forde <j...@mninc.net>; Sasha Litvak <
> alexander.v.lit...@gmail.com>
>
>
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Mon Create currently at the state of probing
>
>
>
> Question... Why are you reinstalling the node, removing the mon from the
> cluster, and adding it back into the cluster to upgrade to Kraken?  The
> upgrade path from 10.2.5 to 11.2.0 is an acceptable upgrade path.  If you
> just needed to reinstall the OS for some reason, then you can keep the
> /var/lib/ceph/mon/r710e/ folder in tact and not need to remove/re-add the
> mon to reisntall the OS.  Even if you upgraded from 14.04 to 16.04, this
> would work.  You would want to change the upstart file in the daemon's
> folder to systemd and make sure it works with systemctl just fine, but the
> daemon itself would be fine.
>
>
>
> If you are hell-bent on doing this the hardest way I've ever heard of,
> then you might want to check out this Note from the docs for
> adding/removing a mon.  Since you are far enough removed from the initial
> ceph-deploy, you have removed r710e from your configuration and if you
> don't have a public network statement in your ceph.conf file... that could
> be your problem for the probing.
>
>
>
> http://docs.ceph.com/docs/kraken/rados/deployment/ceph-deploy-mon/
>
> "
>
> *Note*
>
>
>
> When adding a monitor on a host that was not in hosts initially defined
> with the ceph-deploy new command, a public network statement needs to be
> added to the ceph.conf file."
>
>
>
>
>
> On Mon, Jun 19, 2017 at 1:09 PM Jim Forde <j...@mninc.net> wrote:
>
> No, I don’t think Ubuntu 14.04 has it enabled by default.
>
> Double checked.
>
> Sudo ufw status
>
> Status: inactive.
>
> No other symptoms of a firewall.
>
>
>
> *From:* Sasha Litvak [mailto:alexander.v.lit...@gmail.com]
> *Sent:* Sunday, June 18, 2017 11:10 PM
> *To:* Jim Forde <j...@mninc.net>
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Mon Create currently at the state of probing
>
>
>
> Do you have firewall on on new server by any chance?
>
>
>
> On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde <j...@mninc.net> wrote:
>
> I have an eight node ceph cluster running Jewel 10.2.5.
>
> One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
>
> Ceph-Deploy node is r710T
>
> OSD’s are r710a, r710b, r710c, and r710d.
>
> Mon’s are r710e, r710f, and r710g.
>
> Name resolution is in Hosts file on each node.
>
>
>
> Successfully removed Monitor r710e from cluster
>
> Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0
> all other nodes are still 10.2.5)
>
> Ceph -s is HEALTH_OK 2 mons
>
> Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
>
> “Ceph-deploy install –release kraken r710e” is successful with ceph -v
> returning 11.2.0 on node r710e
>
> “ceph-deploy admin r710e” is successful and puts the keyring in
> /etc/ceph/ceph.client.admin.keyring
>
> “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”
>
>
>
> Everything seems successful to this point.
>
> Then I run
>
> “ceph-deploy mon create r710e” and I get the following:
>
>
>
> [r710e][DEBUG ]
> 
>
> [r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
>
> [r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.r710e.asok mon_status
>
> [r710e][WARNIN] r710e is not defined in `mon initial members`
>
> [r710e][WARNIN] monitor r710e does not exist in monmap
>
>
>
> R710e is in the ‘mon initial members’.
>
> It is in the ceph.conf file correctly (it was running before and the
> parameters have not changed) Public and Cluster networks are defined.
>
> It is the same physical server with the same (but freshly installed) OS
> and same IP address.
>
> Looking at the local daemon mon_status on all three monitors I see.
>
> R710f and r710g see r710e as an “extra_probe_peers”
>
> R710e sees r710f and r710g as “extra_probe_peers”
>
>
>
> “ceph-deploy purge r710e” and “ceph-deplo

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-21 Thread Jim Forde
David,
Thanks for the reply.

The scenario:
Monitor node fails for whatever reason, Bad blocks in HD, or Motherboard fail, 
whatever.

Procedure:
Remove the monitor from the cluster, replace hardware, reinstall OS and add 
monitor to cluster.

That is exactly what I did. However, my ceph-deploy node had already been 
upgraded to Kraken.
The goal is to not use this as an upgrade path per se, but to recover from a 
failed monitor node in a cluster where there is an upgrade in progress.

The upgrade notes for Jewel to Kraken say you may upgrade OSDs Monitors and 
MSDs in any order. Perhaps I am reading too much into this, but I took it as I 
could proceed with the upgrade at my leisure. Making sure each node is 
successfully upgraded before proceeding to the next node. The implication is 
that I can run the cluster with different version daemons (at least during the 
upgrade process).

So that brings me to the problem at hand.
What is the correct procedure for replacing a failed Monitor Node, especially 
if the failed Monitor is a mon_initial_member?
Does it have to be the same version as the other Monitors in the cluster?
I do have a public network statement in the ceph.conf file.
The monitor r710e is listed as one of the mon_initial_members in ceph.conf with 
the correct IP address, but the error message is:
“[r710e][WARNIN] r710e is not defined in `mon initial members`”
Also “[r710e][WARNIN] monitor r710e does not exist in monmap”
Should I manually inject r710e in the monmap?



INFO

ceph.conf
cat /etc/ceph/ceph.conf
[global]
fsid = 0be01315-7928-4037-ae7c-1b0cd36e52b8
mon_initial_members = r710g,r710f,r710e
mon_host = 10.0.40.27,10.0.40.26,10.0.40.25
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.0.40.0/24
cluster network = 10.0.50.0/24

[mon]
mon host = r710g,r710f,r710e
mon addr = 10.0.40.27:6789,10.0.40.26:6789,10.0.40.25:6789


monmap
monmaptool: monmap file /tmp/monmap
epoch 12
fsid 0be01315-7928-4037-ae7c-1b0cd36e52b8
last_changed 2017-06-15 08:15:10.542055
created 2016-11-17 11:42:18.481472
0: 10.0.40.26:6789/0 mon.r710f
1: 10.0.40.27:6789/0 mon.r710g


Status
ceph -s
cluster 0be01315-7928-4037-ae7c-1b0cd36e52b8
 health HEALTH_OK
 monmap e12: 2 mons at {r710f=10.0.40.26:6789/0,r710g=10.0.40.27:6789/0}
election epoch 252, quorum 0,1 r710f,r710g
 osdmap e7017: 16 osds: 16 up, 16 in
flags sortbitwise,require_jewel_osds
  pgmap v14484684: 256 pgs, 1 pools, 218 GB data, 56119 objects
661 GB used, 8188 GB / 8849 GB avail
 256 active+clean
  client io 8135 B/s rd, 44745 B/s wr, 0 op/s rd, 5 op/s wr


PS.

Tried this too
ceph mon remove r710e
mon.r710e does not exist or has already been removed



From: David Turner [mailto:drakonst...@gmail.com]
Sent: Monday, June 19, 2017 12:58 PM
To: Jim Forde <j...@mninc.net>; Sasha Litvak <alexander.v.lit...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mon Create currently at the state of probing

Question... Why are you reinstalling the node, removing the mon from the 
cluster, and adding it back into the cluster to upgrade to Kraken?  The upgrade 
path from 10.2.5 to 11.2.0 is an acceptable upgrade path.  If you just needed 
to reinstall the OS for some reason, then you can keep the 
/var/lib/ceph/mon/r710e/ folder in tact and not need to remove/re-add the mon 
to reisntall the OS.  Even if you upgraded from 14.04 to 16.04, this would 
work.  You would want to change the upstart file in the daemon's folder to 
systemd and make sure it works with systemctl just fine, but the daemon itself 
would be fine.

If you are hell-bent on doing this the hardest way I've ever heard of, then you 
might want to check out this Note from the docs for adding/removing a mon.  
Since you are far enough removed from the initial ceph-deploy, you have removed 
r710e from your configuration and if you don't have a public network statement 
in your ceph.conf file... that could be your problem for the probing.

http://docs.ceph.com/docs/kraken/rados/deployment/ceph-deploy-mon/
"

Note


When adding a monitor on a host that was not in hosts initially defined with 
the ceph-deploy new command, a public network statement needs to be added to 
the ceph.conf file."


On Mon, Jun 19, 2017 at 1:09 PM Jim Forde 
<j...@mninc.net<mailto:j...@mninc.net>> wrote:
No, I don’t think Ubuntu 14.04 has it enabled by default.
Double checked.
Sudo ufw status
Status: inactive.
No other symptoms of a firewall.

From: Sasha Litvak 
[mailto:alexander.v.lit...@gmail.com<mailto:alexander.v.lit...@gmail.com>]
Sent: Sunday, June 18, 2017 11:10 PM
To: Jim Forde <j...@mninc.net<mailto:j...@mninc.net>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Mon Create currently at the state of probing

Do you have firewall on on new server by any chance?

On Sun, Ju

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-19 Thread David Turner
Question... Why are you reinstalling the node, removing the mon from the
cluster, and adding it back into the cluster to upgrade to Kraken?  The
upgrade path from 10.2.5 to 11.2.0 is an acceptable upgrade path.  If you
just needed to reinstall the OS for some reason, then you can keep the
/var/lib/ceph/mon/r710e/ folder in tact and not need to remove/re-add the
mon to reisntall the OS.  Even if you upgraded from 14.04 to 16.04, this
would work.  You would want to change the upstart file in the daemon's
folder to systemd and make sure it works with systemctl just fine, but the
daemon itself would be fine.

If you are hell-bent on doing this the hardest way I've ever heard of, then
you might want to check out this Note from the docs for adding/removing a
mon.  Since you are far enough removed from the initial ceph-deploy, you
have removed r710e from your configuration and if you don't have a public
network statement in your ceph.conf file... that could be your problem for
the probing.

http://docs.ceph.com/docs/kraken/rados/deployment/ceph-deploy-mon/
"

Note


When adding a monitor on a host that was not in hosts initially defined
with the ceph-deploy new command, a public network statement needs to be
added to the ceph.conf file."


On Mon, Jun 19, 2017 at 1:09 PM Jim Forde <j...@mninc.net> wrote:

> No, I don’t think Ubuntu 14.04 has it enabled by default.
>
> Double checked.
>
> Sudo ufw status
>
> Status: inactive.
>
> No other symptoms of a firewall.
>
>
>
> *From:* Sasha Litvak [mailto:alexander.v.lit...@gmail.com]
> *Sent:* Sunday, June 18, 2017 11:10 PM
> *To:* Jim Forde <j...@mninc.net>
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Mon Create currently at the state of probing
>
>
>
> Do you have firewall on on new server by any chance?
>
>
>
> On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde <j...@mninc.net> wrote:
>
> I have an eight node ceph cluster running Jewel 10.2.5.
>
> One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
>
> Ceph-Deploy node is r710T
>
> OSD’s are r710a, r710b, r710c, and r710d.
>
> Mon’s are r710e, r710f, and r710g.
>
> Name resolution is in Hosts file on each node.
>
>
>
> Successfully removed Monitor r710e from cluster
>
> Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0
> all other nodes are still 10.2.5)
>
> Ceph -s is HEALTH_OK 2 mons
>
> Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
>
> “Ceph-deploy install –release kraken r710e” is successful with ceph -v
> returning 11.2.0 on node r710e
>
> “ceph-deploy admin r710e” is successful and puts the keyring in
> /etc/ceph/ceph.client.admin.keyring
>
> “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”
>
>
>
> Everything seems successful to this point.
>
> Then I run
>
> “ceph-deploy mon create r710e” and I get the following:
>
>
>
> [r710e][DEBUG ]
> 
>
> [r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
>
> [r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.r710e.asok mon_status
>
> [r710e][WARNIN] r710e is not defined in `mon initial members`
>
> [r710e][WARNIN] monitor r710e does not exist in monmap
>
>
>
> R710e is in the ‘mon initial members’.
>
> It is in the ceph.conf file correctly (it was running before and the
> parameters have not changed) Public and Cluster networks are defined.
>
> It is the same physical server with the same (but freshly installed) OS
> and same IP address.
>
> Looking at the local daemon mon_status on all three monitors I see.
>
> R710f and r710g see r710e as an “extra_probe_peers”
>
> R710e sees r710f and r710g as “extra_probe_peers”
>
>
>
> “ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot
> of the 2 mon’s brings cluster back to HEALTH_OK
>
>
>
> Not sure what is going on. Is Ceph allergic to single node upgrades?
> Afraid to push the upgrade on all mon’s.
>
>
>
> What I have done:
>
> Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt
> with different name and IP address. Same result.
>
> I have also restructured the NTP server. R710T is my NTP server on the
> cluster. (HEALTH_OK prior to updating) I reset all Mon nodes to get time
> from Ubuntu default NTP sources. Same error.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mon Create currently at the state of probing

2017-06-19 Thread Jim Forde
No, I don’t think Ubuntu 14.04 has it enabled by default.
Double checked.
Sudo ufw status
Status: inactive.
No other symptoms of a firewall.

From: Sasha Litvak [mailto:alexander.v.lit...@gmail.com]
Sent: Sunday, June 18, 2017 11:10 PM
To: Jim Forde <j...@mninc.net>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mon Create currently at the state of probing

Do you have firewall on on new server by any chance?

On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde 
<j...@mninc.net<mailto:j...@mninc.net>> wrote:
I have an eight node ceph cluster running Jewel 10.2.5.
One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
Ceph-Deploy node is r710T
OSD’s are r710a, r710b, r710c, and r710d.
Mon’s are r710e, r710f, and r710g.
Name resolution is in Hosts file on each node.

Successfully removed Monitor r710e from cluster
Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0 all 
other nodes are still 10.2.5)
Ceph -s is HEALTH_OK 2 mons
Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
“Ceph-deploy install –release kraken r710e” is successful with ceph -v 
returning 11.2.0 on node r710e
“ceph-deploy admin r710e” is successful and puts the keyring in 
/etc/ceph/ceph.client.admin.keyring
“sudo chmod +r /etc/ceph/ceph.client.admin.keyring”

Everything seems successful to this point.
Then I run
“ceph-deploy mon create r710e” and I get the following:

[r710e][DEBUG ] 

[r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
[r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.r710e.asok mon_status
[r710e][WARNIN] r710e is not defined in `mon initial members`
[r710e][WARNIN] monitor r710e does not exist in monmap

R710e is in the ‘mon initial members’.
It is in the ceph.conf file correctly (it was running before and the parameters 
have not changed) Public and Cluster networks are defined.
It is the same physical server with the same (but freshly installed) OS and 
same IP address.
Looking at the local daemon mon_status on all three monitors I see.
R710f and r710g see r710e as an “extra_probe_peers”
R710e sees r710f and r710g as “extra_probe_peers”

“ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot of 
the 2 mon’s brings cluster back to HEALTH_OK

Not sure what is going on. Is Ceph allergic to single node upgrades? Afraid to 
push the upgrade on all mon’s.

What I have done:
Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt with 
different name and IP address. Same result.
I have also restructured the NTP server. R710T is my NTP server on the cluster. 
(HEALTH_OK prior to updating) I reset all Mon nodes to get time from Ubuntu 
default NTP sources. Same error.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mon Create currently at the state of probing

2017-06-18 Thread Sasha Litvak
Do you have firewall on on new server by any chance?

On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde  wrote:

> I have an eight node ceph cluster running Jewel 10.2.5.
>
> One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
>
> Ceph-Deploy node is r710T
>
> OSD’s are r710a, r710b, r710c, and r710d.
>
> Mon’s are r710e, r710f, and r710g.
>
> Name resolution is in Hosts file on each node.
>
>
>
> Successfully removed Monitor r710e from cluster
>
> Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0
> all other nodes are still 10.2.5)
>
> Ceph -s is HEALTH_OK 2 mons
>
> Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
>
> “Ceph-deploy install –release kraken r710e” is successful with ceph -v
> returning 11.2.0 on node r710e
>
> “ceph-deploy admin r710e” is successful and puts the keyring in
> /etc/ceph/ceph.client.admin.keyring
>
> “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”
>
>
>
> Everything seems successful to this point.
>
> Then I run
>
> “ceph-deploy mon create r710e” and I get the following:
>
>
>
> [r710e][DEBUG ] **
> **
>
> [r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
>
> [r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.r710e.asok mon_status
>
> [r710e][WARNIN] r710e is not defined in `mon initial members`
>
> [r710e][WARNIN] monitor r710e does not exist in monmap
>
>
>
> R710e is in the ‘mon initial members’.
>
> It is in the ceph.conf file correctly (it was running before and the
> parameters have not changed) Public and Cluster networks are defined.
>
> It is the same physical server with the same (but freshly installed) OS
> and same IP address.
>
> Looking at the local daemon mon_status on all three monitors I see.
>
> R710f and r710g see r710e as an “extra_probe_peers”
>
> R710e sees r710f and r710g as “extra_probe_peers”
>
>
>
> “ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot
> of the 2 mon’s brings cluster back to HEALTH_OK
>
>
>
> Not sure what is going on. Is Ceph allergic to single node upgrades?
> Afraid to push the upgrade on all mon’s.
>
>
>
> What I have done:
>
> Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt
> with different name and IP address. Same result.
>
> I have also restructured the NTP server. R710T is my NTP server on the
> cluster. (HEALTH_OK prior to updating) I reset all Mon nodes to get time
> from Ubuntu default NTP sources. Same error.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mon Create currently at the state of probing

2017-06-18 Thread Jim Forde
I have an eight node ceph cluster running Jewel 10.2.5.
One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
Ceph-Deploy node is r710T
OSD's are r710a, r710b, r710c, and r710d.
Mon's are r710e, r710f, and r710g.
Name resolution is in Hosts file on each node.

Successfully removed Monitor r710e from cluster
Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0 all 
other nodes are still 10.2.5)
Ceph -s is HEALTH_OK 2 mons
Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
"Ceph-deploy install -release kraken r710e" is successful with ceph -v 
returning 11.2.0 on node r710e
"ceph-deploy admin r710e" is successful and puts the keyring in 
/etc/ceph/ceph.client.admin.keyring
"sudo chmod +r /etc/ceph/ceph.client.admin.keyring"

Everything seems successful to this point.
Then I run
"ceph-deploy mon create r710e" and I get the following:

[r710e][DEBUG ] 

[r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
[r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.r710e.asok mon_status
[r710e][WARNIN] r710e is not defined in `mon initial members`
[r710e][WARNIN] monitor r710e does not exist in monmap

R710e is in the 'mon initial members'.
It is in the ceph.conf file correctly (it was running before and the parameters 
have not changed) Public and Cluster networks are defined.
It is the same physical server with the same (but freshly installed) OS and 
same IP address.
Looking at the local daemon mon_status on all three monitors I see.
R710f and r710g see r710e as an "extra_probe_peers"
R710e sees r710f and r710g as "extra_probe_peers"

"ceph-deploy purge r710e" and "ceph-deploy purgedata r710e" with a reboot of 
the 2 mon's brings cluster back to HEALTH_OK

Not sure what is going on. Is Ceph allergic to single node upgrades? Afraid to 
push the upgrade on all mon's.

What I have done:
Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt with 
different name and IP address. Same result.
I have also restructured the NTP server. R710T is my NTP server on the cluster. 
(HEALTH_OK prior to updating) I reset all Mon nodes to get time from Ubuntu 
default NTP sources. Same error.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com