Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread 张皓宇

There is asok on computer06. 
I tried to start the mon.computer06, maybe two hours later,  the mon.computer06 
still not start,
but there are some different processes on computer06, I don't know how to 
handle it:
root  7812 1  0 11:39 pts/400:00:00 python 
/usr/sbin/ceph-create-keys -i computer06
root 11025 1 12 09:02 pts/400:32:13 /usr/bin/ceph-mon -i computer06 
--pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root 35692  7812  0 12:59 pts/400:00:00 python /usr/bin/ceph 
--cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status


I got the quorum_status from another running monitor:
{ election_epoch: 508,
  quorum: [
0,
1],
  quorum_names: [
computer05,
computer04],
  quorum_leader_name: computer04,
  monmap: { epoch: 4,
  fsid: 471483e5-493f-41f6-b6f4-0187c13d156d,
  modified: 2014-07-26 09:52:02.411967,
  created: 0.00,
  mons: [
{ rank: 0,
  name: computer04,
  addr: 192.168.1.60:6789\/0},
{ rank: 1,
  name: computer05,
  addr: 192.168.1.65:6789\/0},
{ rank: 2,
  name: computer06,
  addr: 192.168.1.66:6789\/0}]}} 

 Date: Tue, 31 Mar 2015 12:30:22 -0700
 Subject: Re: [ceph-users] One of three monitors can not be started
 From: g...@gregs42.com
 To: zhanghaoyu1...@hotmail.com
 CC: ceph-users@lists.ceph.com
 
 On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 zhanghaoyu1...@hotmail.com wrote:
  Who can help me?
 
  One monitor in my ceph cluster can not be started.
  Before that, I added '[mon] mon_compact_on_start = true' to
  /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
  mon.computer05 compact ' on computer05, which has a monitor on it.
  When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
  and it can not be started since that.
 
  If I start mon.computer06, it will stop on this state:
  # /etc/init.d/ceph start mon.computer06
  === mon.computer06 ===
  Starting Ceph mon.computer06 on computer06...
 
  The process info is like this:
  root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
  mon.computer06
  root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
  /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
  -c /etc/ceph/ceph.conf
  root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
  --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
  root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
  --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
 
  Log on computer06 is like this:
  2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
  (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
  ...
  2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
  preinit clean up potentially inconsistent store state
 
 So I haven't looked at this code in a while, but I think the monitor
 is trying to validate that it's consistent with the others. You
 probably want to dig around the monitor admin sockets and see what
 state each monitor is in, plus its perception of the others.
 
 In this case, I think maybe mon.computer06 is trying to examine its
 whole store, but 100GB is a lot (way too much, in fact), so this can
 take a lng time.
 
 
  Sorry, my English is not good.
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One of three monitors can not be started

2015-03-31 Thread Gregory Farnum
On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 zhanghaoyu1...@hotmail.com wrote:
 Who can help me?

 One monitor in my ceph cluster can not be started.
 Before that, I added '[mon] mon_compact_on_start = true' to
 /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
 mon.computer05 compact ' on computer05, which has a monitor on it.
 When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
 and it can not be started since that.

 If I start mon.computer06, it will stop on this state:
 # /etc/init.d/ceph start mon.computer06
 === mon.computer06 ===
 Starting Ceph mon.computer06 on computer06...

 The process info is like this:
 root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
 mon.computer06
 root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
 /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
 -c /etc/ceph/ceph.conf
 root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
 root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

 Log on computer06 is like this:
 2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
 ...
 2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
 preinit clean up potentially inconsistent store state

So I haven't looked at this code in a while, but I think the monitor
is trying to validate that it's consistent with the others. You
probably want to dig around the monitor admin sockets and see what
state each monitor is in, plus its perception of the others.

In this case, I think maybe mon.computer06 is trying to examine its
whole store, but 100GB is a lot (way too much, in fact), so this can
take a lng time.


 Sorry, my English is not good.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com