OOPS, We forgot to disable firewalld on the new centos7 VM, once firewalld disabled, replicate finished in seconds.
as below. ``` I1125 18:27:33.737843 2490 replica.cpp:369] Replica ignoring promise request as it is in RECOVERING status I1125 18:27:33.740927 2489 replica.cpp:655] Replica received learned notice for position 984 I1125 18:27:33.741539 2489 leveldb.cpp:343] Persisting action (20 bytes) to leveldb took 572913ns I1125 18:27:33.741628 2489 replica.cpp:676] Persisted action at 984 I1125 18:27:33.741673 2489 replica.cpp:661] Replica learned TRUNCATE action at position 984 I1125 18:27:33.742463 2491 recover.cpp:554] Updating replica status to VOTING I1125 18:27:33.743335 2490 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 739871ns I1125 18:27:33.743413 2490 replica.cpp:320] Persisted replica status to VOTING I1125 18:27:33.743522 2490 recover.cpp:568] Successfully joined the Paxos group I1125 18:27:33.743727 2490 recover.cpp:452] Recover process terminated ``` -- Thanks, Chengwei On Wed, Nov 25, 2015 at 06:25:51PM +0800, Chengwei Yang wrote: > while the other 2 mesos-master (one leader and one follower) both repeat below > log. > > I1125 18:06:33.315208 28401 replica.cpp:638] Replica in VOTING status > received a broadcasted recover request > I1125 18:06:43.316341 28404 replica.cpp:638] Replica in VOTING status > received a broadcasted recover request > I1125 18:06:53.318739 28399 replica.cpp:638] Replica in VOTING status > received a broadcasted recover request > I1125 18:07:03.321287 28403 replica.cpp:638] Replica in VOTING status > received a broadcasted recover request > > Seems the new mesos-master can not catch up and continuously retry, is this a > bug? > > I'm using mesos-0.21.0 on centos7, the vanilla rpm released by mesosphere. > > -- > Thanks, > Chengwei > > > On Wed, Nov 25, 2015 at 05:45:56PM +0800, Chengwei Yang wrote: > > Hi All, > > > > I did step 1 below and check logs from the new started mesos-master, and it > > continuously complaint like below. > > > > ``` > > I1125 17:42:59.066706 2330 recover.cpp:188] Received a recover response > > from a replica in EMPTY status > > I1125 17:43:09.065188 2331 recover.cpp:111] Unable to finish the recover > > protocol in 10secs, retrying > > I1125 17:43:09.066992 2330 replica.cpp:638] Replica in EMPTY status > > received a broadcasted recover request > > I1125 17:43:09.067425 2324 recover.cpp:188] Received a recover response > > from a replica in EMPTY status > > I1125 17:43:19.067332 2331 recover.cpp:111] Unable to finish the recover > > protocol in 10secs, retrying > > I1125 17:43:19.069587 2323 replica.cpp:638] Replica in EMPTY status > > received a broadcasted recover request > > I1125 17:43:19.069807 2323 recover.cpp:188] Received a recover response > > from a replica in EMPTY status > > ``` > > > > Seems it can not catch up the other replicas? > > > > -- > > Thanks, > > Chengwei > > > > On Tue, Nov 24, 2015 at 09:47:16AM +0800, Chengwei Yang wrote: > > > Hi all, > > > > > > We're using mesos in product on CentOS 6 and plan to upgrade CentOS to > > > 7.1, to > > > avoid affect any tasks running on mesos. We're about to replace all > > > mesos-masters in fly. > > > > > > The procedure listed below: > > > > > > 0. 3 mesos-masters running on CentOS 6 > > > 1. shutdown 1 mesos-master(CentOS 6) and bring up 1 mesos-master(CentOS 7) > > > wait the new master synced for some time(is there any simple way to > > > know when?) > > > 2. repeat step 1 > > > > > > NOTE: we plan to shutdown non-leader first, and shutdown the > > > leader(CentOS 6) > > > last. > > > > > > Can we do this in such way? Or any other better suggestions? > > > > > > -- > > > Thanks, > > > Chengwei > > > >
signature.asc
Description: Digital signature

