There is asok on computer06.
I tried to start the mon.computer06, maybe two hours later, the mon.computer06
still not start,
but there are some different processes on computer06, I don't know how to
handle it:
root 7812 1 0 11:39 pts/400:00:00 python
/usr/sbin/ceph-create-keys -i computer06
root 11025 1 12 09:02 pts/400:32:13 /usr/bin/ceph-mon -i computer06
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root 35692 7812 0 12:59 pts/400:00:00 python /usr/bin/ceph
--cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status
I got the quorum_status from another running monitor:
{ election_epoch: 508,
quorum: [
0,
1],
quorum_names: [
computer05,
computer04],
quorum_leader_name: computer04,
monmap: { epoch: 4,
fsid: 471483e5-493f-41f6-b6f4-0187c13d156d,
modified: 2014-07-26 09:52:02.411967,
created: 0.00,
mons: [
{ rank: 0,
name: computer04,
addr: 192.168.1.60:6789\/0},
{ rank: 1,
name: computer05,
addr: 192.168.1.65:6789\/0},
{ rank: 2,
name: computer06,
addr: 192.168.1.66:6789\/0}]}}
Date: Tue, 31 Mar 2015 12:30:22 -0700
Subject: Re: [ceph-users] One of three monitors can not be started
From: g...@gregs42.com
To: zhanghaoyu1...@hotmail.com
CC: ceph-users@lists.ceph.com
On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 zhanghaoyu1...@hotmail.com wrote:
Who can help me?
One monitor in my ceph cluster can not be started.
Before that, I added '[mon] mon_compact_on_start = true' to
/etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
mon.computer05 compact ' on computer05, which has a monitor on it.
When store.db of computer05 changed from 108G to 1G, mon.computer06 stoped,
and it can not be started since that.
If I start mon.computer06, it will stop on this state:
# /etc/init.d/ceph start mon.computer06
=== mon.computer06 ===
Starting Ceph mon.computer06 on computer06...
The process info is like this:
root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
mon.computer06
root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
/usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
-c /etc/ceph/ceph.conf
root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
Log on computer06 is like this:
2015-03-30 20:46:54.152956 7fc5379d07a0 0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
...
2015-03-30 20:46:54.759791 7fc5379d07a0 1 mon.computer06@-1(probing) e4
preinit clean up potentially inconsistent store state
So I haven't looked at this code in a while, but I think the monitor
is trying to validate that it's consistent with the others. You
probably want to dig around the monitor admin sockets and see what
state each monitor is in, plus its perception of the others.
In this case, I think maybe mon.computer06 is trying to examine its
whole store, but 100GB is a lot (way too much, in fact), so this can
take a lng time.
Sorry, my English is not good.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com